Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-12 Thread Rural Hunter
I tried to add index on partition key and it didn't help. we have
autovacuum running. The updates and inserts are very frequent on these
tables. The server kernel version is 3.5.0-22-generic. It has 376G memory.

max_connections = 2500# (change requires restart)
shared_buffers = 32GB# min 128kB
work_mem = 8MB# min 64kB
maintenance_work_mem = 20GB# min 1MB

We usually have around 400 active connections on the db. Most of them are
idle. There are about 100 connections are in active status and I can see
most of the time they are in 'BIND' status in ps command.

We have heavy IO load on the disk of the default tablespace where I believe
table statistics tables are in. Will that impact the query planing greatly?


2015-08-12 15:00 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com:

 You can give it a try only on that partition just to see if your query
 plan gets better. I prefer defining partitioning over ranging attributes
 like, for example: cid between 123 and 456. It makes more sense, especially
 when there are attributes which value strictly depends on the check
 attribute. Btw, dozens of millions is not a problem on modern systems. I
 remember of reading about a recommended 20 millions per partition but I
 usually work with 60 millions per partition without any problem.

 Do you autovacuum? How frequently do the updates and insert operations
 occur?
 Give us your configuration about work_mem, shared_buffers, max_connections
 etc. Kernel version? If possible avoid 3.2 and 3.8-3.13. Also think to
 upgrade your OS version.

 From today I'm on vacancy, so others could help :)

 Pietro Pugni
 Il 12/ago/2015 03:49, Rural Hunter ruralhun...@gmail.com ha scritto:

 article_729 has about 0.8 million rows. The rows of the children tables
 are variance from several thousands to dozens of millions. How can it help
 to create index on the partition key?

 2015-08-12 1:03 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com:

 Hi Rural Hunter,
 Try to create an index on cid attribute.
 How many rows has article_729?

 Pietro Pugni
 Il 11/ago/2015 16:51, Rural Hunter ruralhun...@gmail.com ha scritto:

 yes i'm very sure. from what i observed, it has something to do with
 the concurrent query planing. if i disconnect other connections, the plan
 is very quick.

 2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 Check constraints:
 article_729_cid_check CHECK (cid = 729)



 Used partition schema looks very simple and straightforward, and
 should have no issues with 80 partitions.
 Are you sure that you have only 80 partitions but not (lets say) 800?
 Are every other partition of the article table use the same general
 idea of partition check (cid=something)?


 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/

 Phone RU: +7 910 405 4718
 Phone AU: +61 45 218 5678

 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
 Skype: maxim.boguk
 Jabber: maxim.bo...@gmail.com
 МойКруг: http://mboguk.moikrug.ru/

 People problems are solved with people.
 If people cannot solve the problem, try technology.
 People will then wish they'd listened at the first stage.






[PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
Platform: pg 9.2.9 on Ubuntu 12.04.4 LTS.
I have a table which is partitioned to about 80 children. There are usualy
several dozens of connections accessing these tables concurrently. I found
sometimes the query planing time is very long if I query against the parent
table with partition key. The connections are shown with status 'BIND' by
ps command.

In normal condition, the plan time of the query is about several hundred of
million seconds while the same query accessing child table directly is less
than 1 million seconds:
# explain select 1 from article where cid=729 and
url_hash='6851f596f55a994b2df417b53523fe45';
 QUERY
PLAN

 Result  (cost=0.00..8.68 rows=2 width=0)
   -  Append  (cost=0.00..8.68 rows=2 width=0)
 -  Seq Scan on article  (cost=0.00..0.00 rows=1 width=0)
   Filter: ((cid = 729) AND (url_hash =
'6851f596f55a994b2df417b53523fe45'::bpchar))
 -  Index Scan using article_729_url_hash on article_729 article
(cost=0.00..8.68 rows=1 width=0)
   Index Cond: (url_hash =
'6851f596f55a994b2df417b53523fe45'::bpchar)
   Filter: (cid = 729)
(7 rows)

Time: 361.401 ms

# explain select 1 from article_729 where
url_hash='6851f596f55a994b2df417b53523fe45';
 QUERY
PLAN
-
 Index Only Scan using article_729_url_hash on article_729
(cost=0.00..8.67 rows=1 width=0)
   Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar)
(2 rows)

Time: 0.898 ms

This is only in normal condition. In extreme condition, the planing time
could take several minutes. There seems some locking issue in query
planing. How can I increase the plan performance? Or is it bad to partition
table to 80 children in PostgreSQL?


Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
# \d article_729
 Table public.article_729
Column|Type |
Modifiers
--+-+---
 aid  | bigint  | not null default
nextval('article_aid_seq'::regclass)
 style| smallint| not null default 0
 oaid | bigint  | default 0
 fid  | integer |
 bid  | integer | default 0
 cid  | integer |
 tid  | integer |
 url  | text| default NULL::bpchar
 tm_post  | timestamp without time zone |
 tm_last_rply | timestamp without time zone |
 author   | character varying(100)  | default NULL::bpchar
 title| character varying(255)  | default NULL::bpchar
 content  | text|
 ab_content   | text|
 rply_cnt | integer |
 read_cnt | integer |
 url_hash | character(32)   | not null
 hash_plain   | text| default NULL::bpchar
 title_hash   | character(32)   | default NULL::bpchar
 guid | character(32)   | default NULL::bpchar
 neg_pos  | smallint| not null default 0
 match_code   | character(32)   | default NULL::bpchar
 tm_spider| timestamp without time zone |
 tm_update| timestamp without time zone |
 stage| smallint| not null default 0
 rply_cut | integer | not null default 0
 read_cut | integer | not null default 0
 src  | integer | default 0
 rfid | integer |
 labels   | integer[]   |
 kwds | integer[]   |
 like_cnt | integer |
Indexes:
article_729_pkey PRIMARY KEY, btree (aid), tablespace indextbs
article_729_url_hash UNIQUE CONSTRAINT, btree (url_hash), tablespace
indextbs
article_729_bid_titlehash_idx btree (bid, title_hash), tablespace
indextbs
article_729_fid_idx btree (fid), tablespace indextbs
article_729_guid_idx btree (guid), tablespace indextbs
article_729_labels_idx gin (labels), tablespace data1tbs
article_729_mtcode_idx btree (match_code), tablespace indextbs
article_729_rfid_author_idx btree (rfid, author), tablespace
indextbs
article_729_stage_idx btree (stage), tablespace data1tbs
article_729_time_style_idx btree (tm_post DESC, style), tablespace
data1tbs
article_729_tm_spider_idx btree (tm_spider), tablespace indextbs
article_729_tm_update_idx btree (tm_update), tablespace data1tbs
Check constraints:
article_729_cid_check CHECK (cid = 729)
Foreign-key constraints:
article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON
DELETE CASCADE
Triggers:
trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_delete()
trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_insert()
trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_update()
Inherits: article

2015-08-11 21:53 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 On Tue, Aug 11, 2015 at 11:44 PM, Rural Hunter ruralhun...@gmail.com
 wrote:

 # \dt+
 ​​
 article_729
   List of relations
  Schema |Name | Type  | Owner  |  Size  | Description
 +-+---+++-
  public | article_729 | table | omuser1 | 655 MB |
 (1 row)
 The problem exists on not only this specific child table, but with all of
 them.


 ​Oops sorry, оf course I mean \d+​
  ​ article_729
 ​ (to see criteria used for partitioning).​



 --
 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/



Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
), tablespace data1tbs
Check constraints:
article_729_cid_check CHECK (cid = 729)
Foreign-key constraints:
article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON
DELETE CASCADE
Triggers:
trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_delete()
trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_insert()
trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW EXECUTE
PROCEDURE fn_article_update()
Inherits: article
Has OIDs: no

2015-08-11 22:00 GMT+08:00 Rural Hunter ruralhun...@gmail.com:

 # \d article_729
  Table public.article_729
 Column|Type |
 Modifiers

 --+-+---
  aid  | bigint  | not null default
 nextval('article_aid_seq'::regclass)
  style| smallint| not null default 0
  oaid | bigint  | default 0
  fid  | integer |
  bid  | integer | default 0
  cid  | integer |
  tid  | integer |
  url  | text| default NULL::bpchar
  tm_post  | timestamp without time zone |
  tm_last_rply | timestamp without time zone |
  author   | character varying(100)  | default NULL::bpchar
  title| character varying(255)  | default NULL::bpchar
  content  | text|
  ab_content   | text|
  rply_cnt | integer |
  read_cnt | integer |
  url_hash | character(32)   | not null
  hash_plain   | text| default NULL::bpchar
  title_hash   | character(32)   | default NULL::bpchar
  guid | character(32)   | default NULL::bpchar
  neg_pos  | smallint| not null default 0
  match_code   | character(32)   | default NULL::bpchar
  tm_spider| timestamp without time zone |
  tm_update| timestamp without time zone |
  stage| smallint| not null default 0
  rply_cut | integer | not null default 0
  read_cut | integer | not null default 0
  src  | integer | default 0
  rfid | integer |
  labels   | integer[]   |
  kwds | integer[]   |
  like_cnt | integer |
 Indexes:
 article_729_pkey PRIMARY KEY, btree (aid), tablespace indextbs
 article_729_url_hash UNIQUE CONSTRAINT, btree (url_hash), tablespace
 indextbs
 article_729_bid_titlehash_idx btree (bid, title_hash), tablespace
 indextbs
 article_729_fid_idx btree (fid), tablespace indextbs
 article_729_guid_idx btree (guid), tablespace indextbs
 article_729_labels_idx gin (labels), tablespace data1tbs
 article_729_mtcode_idx btree (match_code), tablespace indextbs
 article_729_rfid_author_idx btree (rfid, author), tablespace
 indextbs
 article_729_stage_idx btree (stage), tablespace data1tbs
 article_729_time_style_idx btree (tm_post DESC, style), tablespace
 data1tbs
 article_729_tm_spider_idx btree (tm_spider), tablespace indextbs
 article_729_tm_update_idx btree (tm_update), tablespace data1tbs
 Check constraints:
 article_729_cid_check CHECK (cid = 729)
 Foreign-key constraints:
 article_729_cid_fk FOREIGN KEY (cid) REFERENCES company(cid) ON
 DELETE CASCADE
 Triggers:
 trg_article_729_delete AFTER DELETE ON article_729 FOR EACH ROW
 EXECUTE PROCEDURE fn_article_delete()
 trg_article_729_insert AFTER INSERT ON article_729 FOR EACH ROW
 EXECUTE PROCEDURE fn_article_insert()
 trg_article_729_update AFTER UPDATE ON article_729 FOR EACH ROW
 EXECUTE PROCEDURE fn_article_update()
 Inherits: article

 2015-08-11 21:53 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 On Tue, Aug 11, 2015 at 11:44 PM, Rural Hunter ruralhun...@gmail.com
 wrote:

 # \dt+
 ​​
 article_729
   List of relations
  Schema |Name | Type  | Owner  |  Size  | Description
 +-+---+++-
  public | article_729 | table | omuser1 | 655 MB |
 (1 row)
 The problem exists on not only this specific child table, but with all
 of them.


 ​Oops sorry, оf course I mean \d+​
  ​ article_729
 ​ (to see criteria used for partitioning).​



 --
 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/





Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
yes i'm very sure. from what i observed, it has something to do with the
concurrent query planing. if i disconnect other connections, the plan is
very quick.

2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 Check constraints:
 article_729_cid_check CHECK (cid = 729)



 Used partition schema looks very simple and straightforward, and should
 have no issues with 80 partitions.
 Are you sure that you have only 80 partitions but not (lets say) 800?
 Are every other partition of the article table use the same general idea
 of partition check (cid=something)?


 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/

 Phone RU: +7 910 405 4718
 Phone AU: +61 45 218 5678

 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
 Skype: maxim.boguk
 Jabber: maxim.bo...@gmail.com
 МойКруг: http://mboguk.moikrug.ru/

 People problems are solved with people.
 If people cannot solve the problem, try technology.
 People will then wish they'd listened at the first stage.




Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
# \dt+ article_729
  List of relations
 Schema |Name | Type  | Owner  |  Size  | Description
+-+---+++-
 public | article_729 | table | omuser1 | 655 MB |
(1 row)
The problem exists on not only this specific child table, but with all of
them.

2015-08-11 19:43 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 On Tue, Aug 11, 2015 at 6:46 PM, Rural Hunter ruralhun...@gmail.com
 wrote:

 Platform: pg 9.2.9 on Ubuntu 12.04.4 LTS.
 I have a table which is partitioned to about 80 children. There are
 usualy several dozens of connections accessing these tables concurrently. I
 found sometimes the query planing time is very long if I query against the
 parent table with partition key. The connections are shown with status
 'BIND' by ps command.

 In normal condition, the plan time of the query is about several hundred
 of million seconds while the same query accessing child table directly is
 less than 1 million seconds:
 # explain select 1 from article where cid=729 and
 url_hash='6851f596f55a994b2df417b53523fe45';
  QUERY
 PLAN

 
  Result  (cost=0.00..8.68 rows=2 width=0)
-  Append  (cost=0.00..8.68 rows=2 width=0)
  -  Seq Scan on article  (cost=0.00..0.00 rows=1 width=0)
Filter: ((cid = 729) AND (url_hash =
 '6851f596f55a994b2df417b53523fe45'::bpchar))
  -  Index Scan using article_729_url_hash on
 ​​
 article_729 article  (cost=0.00..8.68 rows=1 width=0)
Index Cond: (url_hash =
 '6851f596f55a994b2df417b53523fe45'::bpchar)
Filter: (cid = 729)
 (7 rows)

 Time: 361.401 ms

 # explain select 1 from article_729 where
 url_hash='6851f596f55a994b2df417b53523fe45';
  QUERY
 PLAN

 -
  Index Only Scan using article_729_url_hash on article_729
 (cost=0.00..8.67 rows=1 width=0)
Index Cond: (url_hash = '6851f596f55a994b2df417b53523fe45'::bpchar)
 (2 rows)

 Time: 0.898 ms

 This is only in normal condition. In extreme condition, the planing time
 could take several minutes. There seems some locking issue in query
 planing. How can I increase the plan performance? Or is it bad to partition
 table to 80 children in PostgreSQL?


 ​Hi,

 Could you provide full definition of ​​article_729 table (\dt+
 ​article_729)?
 80 partitions is adequate amount of partitions for the PostgreSQL, so
 there are going something unusual (I suspect it may be related to used
 partitioning schema).


 --
 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/




Re: [PERFORM] Query Plan Performance on Partitioned Table

2015-08-11 Thread Rural Hunter
article_729 has about 0.8 million rows. The rows of the children tables are
variance from several thousands to dozens of millions. How can it help to
create index on the partition key?

2015-08-12 1:03 GMT+08:00 Pietro Pugni pietro.pu...@gmail.com:

 Hi Rural Hunter,
 Try to create an index on cid attribute.
 How many rows has article_729?

 Pietro Pugni
 Il 11/ago/2015 16:51, Rural Hunter ruralhun...@gmail.com ha scritto:

 yes i'm very sure. from what i observed, it has something to do with the
 concurrent query planing. if i disconnect other connections, the plan is
 very quick.

 2015-08-11 22:42 GMT+08:00 Maxim Boguk maxim.bo...@gmail.com:



 Check constraints:
 article_729_cid_check CHECK (cid = 729)



 Used partition schema looks very simple and straightforward, and should
 have no issues with 80 partitions.
 Are you sure that you have only 80 partitions but not (lets say) 800?
 Are every other partition of the article table use the same general idea
 of partition check (cid=something)?


 Maxim Boguk
 Senior Postgresql DBA
 http://www.postgresql-consulting.ru/
 http://www.postgresql-consulting.com/

 Phone RU: +7 910 405 4718
 Phone AU: +61 45 218 5678

 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
 Skype: maxim.boguk
 Jabber: maxim.bo...@gmail.com
 МойКруг: http://mboguk.moikrug.ru/

 People problems are solved with people.
 If people cannot solve the problem, try technology.
 People will then wish they'd listened at the first stage.





Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-15 Thread Rural Hunter

  
  
pls check this if it helps:
  http://ubuntuforums.org/showthread.php?t=2258734
  
  在 2015/3/15 18:54, Robert Kaye 写道:


  
  Hi!
  
  
  We at MusicBrainz have been having trouble with our
Postgres install for the past few days. I’ve collected all the
relevant information here:
  
  
    http://blog.musicbrainz.org/2015/03/15/postgres-troubles/
  
  
  If anyone could provide tips, suggestions or other
relevant advice for what to poke at next, we would love it.
  
  
  Thanks!
  
  
  

  --

--ruaok    

Robert Kaye -- r...@musicbrainz.org --
   http://musicbrainz.org


  


  




Re: [PERFORM] Very slow planning performance on partition table

2014-08-04 Thread Rural Hunter

Hi Tom,

Could my problem be a victim of this issue?
http://postgresql.1045698.n5.nabble.com/Planner-performance-extremely-affected-by-an-hanging-transaction-20-30-times-td5771686.html

is the patch mentioned in that thread applied in 9.2.9?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-08-04 Thread Rural Hunter

在 2014/7/30 18:03, Rural Hunter 写道:
I think I understand what happened now. I have another monitor script 
runs periodically and calls pg_cancel_backend and pg_terminate_backend 
for those hanging update sqls. However for some unkown reason the 
cancle and termiante command doesn't work at pgsql side for those 
update sqls.


With the log of the monitorkill scipt, I can confirm that the 
CLOSE_WAIT is not caused by it. I logged the netstat before actually 
doing the kill and found the CLOSE_WAIT connections were already there. 
So it must be something else caused the CLOSE_WAIT connections.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-30 Thread Rural Hunter
I think I understand what happened now. I have another monitor script 
runs periodically and calls pg_cancel_backend and pg_terminate_backend 
for those hanging update sqls. However for some unkown reason the cancle 
and termiante command doesn't work at pgsql side for those update sqls.


But I think pgbouncer side was notified by cancel or terminate command. 
It then drops old connections and creates new ones while those old 
connections still hang at pgsql side. That's why the connection status 
shows CLOST_WAIT and there are more processes at pgsql side than 
pgbouncer defined . So the root cause is still at pgsql side. It 
shouldn't hang there. What the hanging process was doing is in my 
previous posts. There many same concurrent sql which updates a 
partitioned table witouth partition key specified in conditions. The gdb 
trace shows this:

(gdb) bt
#0  0x7f8cea310db7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x005f97d3 in PGSemaphoreLock ()
#2  0x00638153 in LWLockAcquire ()
#3  0x004a90d0 in ginTraverseLock ()
#4  0x004a9d0b in ginFindLeafPage ()
#5  0x004a8377 in ginInsertItemPointers ()
#6  0x004a4548 in ginEntryInsert ()
#7  0x004ae687 in ginInsertCleanup ()
#8  0x004af3d6 in ginHeapTupleFastInsert ()
#9  0x004a4ab1 in gininsert ()
#10 0x00709b15 in FunctionCall6Coll ()
#11 0x0047b6b7 in index_insert ()
#12 0x0057f475 in ExecInsertIndexTuples ()
#13 0x0058bf07 in ExecModifyTable ()
#14 0x005766e3 in ExecProcNode ()
#15 0x00575ad4 in standard_ExecutorRun ()
#16 0x0064718f in ProcessQuery ()
#17 0x006473b7 in PortalRunMulti ()
#18 0x00647e8a in PortalRun ()
#19 0x00645160 in PostgresMain ()
#20 0x0060459e in ServerLoop ()
#21 0x006053bc in PostmasterMain ()
#22 0x005a686b in main ()
(gdb) q

It will just hangs there forever and finally blocks all other update 
sqls if I don't stop pgbouncer. When this happens, all the cpus will be 
utilized by those hanging processes and the server load is very very 
high. It keeps at serveral hundreds comparing with about 20 normally 
which causes the performance problem for all tasks on the server.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-29 Thread Rural Hunter
More information found. After the hang connection appears, I noticed 
there were several hundreds of connections of the same user. Since I use 
pgbouncer and I only set the pool size to 50 for each user, this is very 
strange.  I checked the pgbouncer side, 'show pools' showed the active 
server connection count is less than 50(only 35 actually). I also 
checked the client port which is shown in pg process list. It is not 
used at pgbouncer side when I did the check. So I stopped pgbouncer then 
the connection count from the user drops slowly. Finally all those 
connections disappeared. After that I restarted pgbouncer and it looks 
good again.
With this solution, I at least don't have to kill pg when the problem 
happens. But anyone has a clue why this happens? What I need to check 
for the root cause? One thing I forgot to check is the network status of 
those orphan connections at pg side. I will check it next time and see 
if they are in abnormal status.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-29 Thread Rural Hunter

在 2014/7/30 1:27, Jeff Janes 写道:



It sounds like someone is bypassing your pgbouncer and connecting 
directly to your database.  Maybe they tried to create their own 
parallelization and have a master connection going through pgbouncer 
and create many auxiliary connections that go directly to the database 
(probably because pgbouncer wouldn't let them create as many 
connections as they wanted through it).  That would explain why the 
connections slowly drain away once pgbouncer is shut down.


Can you change your pg_hba.conf file so that it only allows 
connections from pgbouncer's IP address?  This should flush out the 
culprit pretty quickly.


Cheers,

Jeff
I suspected that first. But after I checked a few things, I am quite 
sure this is not someone bypassing the pgbouncer.

1. The connections were all from the host of pgbouncer.
2. The id is an application id and no human has access to it. There was 
no other suspect applications running on the host of pgbouncer when the 
problem happened.
3. When I found the problem and checked the connections on the host of 
pgbouncer, those network connection actually didn't exist on the client 
side while they were still hanging at pg server side.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-29 Thread Rural Hunter
This happened again. This time I got the connection status(between 
pgbouncer host to pgsql host) at postgresql side. When the problem 
happens, the connection status is this:

ESTABLISHED: 188
CLOSE_WAIT: 116

The count of connections in CLOSE_WAIT is abnormal. Comparing with 
normal situation, there is usually no close_wait connection. The 
connection status sample is like this:

ESTABLISHED: 117
CLOSE_WAIT: 0

I have 4 users configured in pgbouncer and the pool_size is 50. So the 
max number of connections from pgbouncer should be less than 200.
The connection spike happens very quickly. I created a script to check 
the connections from pgbouncer. The script checks the connections from 
pgbouncer every 5 mins. This is the log:

10:55:01 CST pgbouncer is healthy. connection count: 73
11:00:02 CST pgbouncer is healthy. connection count: 77
11:05:01 CST pgbouncer is healthy. connection count: 118
11:10:01 CST pgbouncer is healthy. connection count: 115
11:15:01 CST pgbouncer is healthy. connection count: 75
11:20:01 CST pgbouncer is healthy. connection count: 73
11:25:02 CST pgbouncer is healthy. connection count: 75
11:30:01 CST pgbouncer is healthy. connection count: 77
11:35:01 CST pgbouncer is healthy. connection count: 84
11:40:10 CST Problematic connection count: 292, will restart pgbouncer...

Now I suspect there is some network problem between the hosts of 
pgbouncer and pgsql. Will check more.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-29 Thread Rural Hunter
This was no error in the log of pgbouncer, but there is a sudden drop of 
request count when the problem happened:
2014-07-30 11:36:51.919 25369 LOG Stats: 2394 req/s, in 339478 b/s, out 
1422425 b/s,query 3792 us
2014-07-30 11:37:51.919 25369 LOG Stats: 2207 req/s, in 314570 b/s, out 
2291440 b/s,query 5344 us
2014-07-30 11:38:51.919 25369 LOG Stats: 2151 req/s, in 288565 b/s, out 
1945795 b/s,query 10016 us

[=problem happens=]
2014-07-30 11:39:51.919 25369 LOG Stats: 1061 req/s, in 140077 b/s, out 
2652730 b/s,query 515753 us

[=pgbouncer restart=]
2014-07-30 11:40:52.780 10640 LOG File descriptor limit: 65535 
(H:65535), max_client_conn: 5500, max fds possible: 6560

2014-07-30 11:40:52.781 10640 LOG Stale pidfile, removing
2014-07-30 11:40:52.782 10642 LOG listening on 0.0.0.0:
2014-07-30 11:40:52.782 10642 WARNING Cannot listen on ::/: bind(): 
Address already in use

2014-07-30 11:40:52.782 10642 LOG listening on unix:/tmp/.s.PGSQL.
2014-07-30 11:40:52.782 10642 LOG process up: pgbouncer 1.5.4, libevent 
1.4.13-stable (epoll), adns: libc-2.11
2014-07-30 11:41:52.781 10642 LOG Stats: 2309 req/s, in 331097 b/s, out 
3806033 b/s,query 4671 us
2014-07-30 11:42:52.782 10642 LOG Stats: 2044 req/s, in 285153 b/s, out 
2932543 b/s,query 4789 us
2014-07-30 11:43:52.782 10642 LOG Stats: 1969 req/s, in 282697 b/s, out 
560439 b/s,query 4607 us
2014-07-30 11:44:52.782 10642 LOG Stats: 2551 req/s, in 351589 b/s, out 
3223438 b/s,query 4364 us



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-28 Thread Rural Hunter

This is the vmstat output when the high load peak happens:
# vmstat 3
procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa
54  1 756868 1047128 264572 34157347200   243   2721 2 51  7 
39  2
53  1 756888 924452 264508 34156608007 31379  3623 53110 184808 
29 55 15  1
70  1 756892 992416 264408 34153088003 14483  9455 53010 183758 
23 61 15  1
93  1 756900 954704 264160 34151420803 20280  3391 66607 304526 
23 59 17  1
65  2 756916 998524 263696 34142752005 23295  2084 53748 213259 
26 60 12  1
46  0 756924 969036 263636 34142108803 23508  1447 51134 200739 
22 59 19  1
123  1 756932 977336 263568 34142601603 21444  2747 48044 174390 
27 59 13  1
71  2 756932 975932 263580 34148352000 19328 89629 54321 234718 
25 59 14  2
47  5 756932 967004 263676 34150224000 19509 52652 56792 236648 
21 60 15  4
70  0 756944 1038464 263660 34146880004 21349  3584 51937 179806 
25 59 15  1
70  0 756940 923800 263532 34147571200 15135  1524 58201 236794 
21 59 19  1
40  1 756940 1022420 263560 34150656000  9163  4889 34702 130106 
19 61 19  1
59  0 756944 939380 263500 34151814401 22809  4024 46398 224644 
21 60 19  1
56  1 756956 954656 263464 34146944004 22927  4477 53705 175386 
28 57 14  1
39  0 756976 968204 263372 34137657607 24612  2556 61900 262784 
30 51 18  1
109  1 756984 1015260 263332 34132377603 16636  4039 29271 
83699  7 85  7  1
76  6 756992 980044 263312 34130812803  6949  1848 27496 130478  
6 90  2  2
103  0 756992 963540 263308 34135206400 22125  2493 20526 61133  
4 88  6  2


Seems most of the cpu is used by sys part.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-28 Thread Rural Hunter
I am now seeing another phenominom of hanging connections. They are 
showing 'UPDATE' status in process list.

(gdb) bt
#0  0x7f783f79d4f7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x005f97d3 in PGSemaphoreLock ()
#2  0x00638153 in LWLockAcquire ()
#3  0x004a9239 in ginStepRight ()
#4  0x004a9c61 in ginFindLeafPage ()
#5  0x004a8377 in ginInsertItemPointers ()
#6  0x004a4548 in ginEntryInsert ()
#7  0x004ae687 in ginInsertCleanup ()
#8  0x004af3d6 in ginHeapTupleFastInsert ()
#9  0x004a4ab1 in gininsert ()
#10 0x00709b15 in FunctionCall6Coll ()
#11 0x0047b6b7 in index_insert ()
#12 0x0057f475 in ExecInsertIndexTuples ()
#13 0x0058bf07 in ExecModifyTable ()
#14 0x005766e3 in ExecProcNode ()
#15 0x00575ad4 in standard_ExecutorRun ()
#16 0x0064718f in ProcessQuery ()
#17 0x006473b7 in PortalRunMulti ()
#18 0x00647e8a in PortalRun ()
#19 0x00645160 in PostgresMain ()
#20 0x0060459e in ServerLoop ()
#21 0x006053bc in PostmasterMain ()
#22 0x005a686b in main ()
(gdb) q

This connection can not be killed by pg_cancel_backend nor 
pg_terminate_backend. It just hangs there and does not respond to normal 
kill command.  I had to kill -9 the process to terminate whole 
postgresql instance. What happened there and how can I kill these 
connections safely?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-28 Thread Rural Hunter

  
  
在 2014/7/29 1:29, Jeff Janes 写道:


  

  
If it were waiting on a pg_locks lock, the semop should
  be coming from ProcSleep, not from LWLockAcquire,
shouldn't it?

  
I'm
guessing he has a lot of connections, and each
connection is locking each partition in shared mode in
rapid fire, generating spin-lock or cache-line
contention.

  
Cheers,


Jeff
  

  

Yes. I have a lot of connections and they maybe coming together and
doing the same update statement without partition key on the
partition table.
  




Re: [PERFORM] Very slow planning performance on partition table

2014-07-27 Thread Rural Hunter
Anyone? I can see many pg processes are in BIND status with htop. Some 
of them could be hanging like 30 mins. I tried gdb on the same process 
many times and the trace shows same as my previous post. This happened 
after I partitioned my main tables to 60 children tables. And also, I'm 
experiecing a cpu peak around 30-60 mins every 1-2 days. During the 
peak, all my cpus(32 cores) are full utilized while there is no special 
load and the memory and io are fine. Sometimes I had to kill the db 
process and restart the db to escape the situation. I tried to upgrade 
to the latest 9.2.9 but it didn't help.


?? 2014/7/25 22:23, Rural Hunter :

I run dbg on the backend process and got this:
(gdb) bt
#0  0x7fc4a1b6cdb7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x005f8703 in PGSemaphoreLock ()
#2  0x00636703 in LWLockAcquire ()
#3  0x00632eb3 in LockAcquireExtended ()
#4  0x0062fdfb in LockRelationOid ()
#5  0x00474e55 in relation_open ()
#6  0x0047b39b in index_open ()
#7  0x005f3c22 in get_relation_info ()
#8  0x005f6590 in build_simple_rel ()
#9  0x005f65db in build_simple_rel ()
#10 0x005de8c0 in add_base_rels_to_query ()
#11 0x005df352 in query_planner ()
#12 0x005e0d51 in grouping_planner ()
#13 0x005e2bbe in subquery_planner ()
#14 0x005e2ef9 in standard_planner ()
#15 0x006426e1 in pg_plan_query ()
#16 0x0064279e in pg_plan_queries ()
#17 0x006f4b7a in BuildCachedPlan ()
#18 0x006f4e1e in GetCachedPlan ()
#19 0x00642259 in exec_bind_message ()
#20 0x00643561 in PostgresMain ()
#21 0x0060347f in ServerLoop ()
#22 0x00604121 in PostmasterMain ()
#23 0x005a5ade in main ()

Does that indicate something? seems it's waiting for some lock.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-27 Thread Rural Hunter
Yes I checked. The connection I inspected is the longest running one. 
There was no other connections blocking it. And I also see all locks are 
granted for it. Does the planning phase require some internal locks?


?? 2014/7/28 0:28, Tom Lane :

Yeah, that's what the stack trace suggests.  Have you looked into pg_locks
and pg_stat_activity to see which lock it wants and what's holding said
lock?

regards, tom lane





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-25 Thread Rural Hunter

?? 2014/7/25 9:53, Tom Lane :
[ shrug... ] Insufficient data. When I try a simple test case based on 
what you've told us, I get planning times of a couple of milliseconds. 
I can think of contributing factors that would increase that, but not 
by four orders of magnitude. So there's something very significant 
that you've left out. Can you construct a self-contained test case 
that's this slow? regards, tom lane 


I run dbg on the backend process and got this:
(gdb) bt
#0  0x7fc4a1b6cdb7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x005f8703 in PGSemaphoreLock ()
#2  0x00636703 in LWLockAcquire ()
#3  0x00632eb3 in LockAcquireExtended ()
#4  0x0062fdfb in LockRelationOid ()
#5  0x00474e55 in relation_open ()
#6  0x0047b39b in index_open ()
#7  0x005f3c22 in get_relation_info ()
#8  0x005f6590 in build_simple_rel ()
#9  0x005f65db in build_simple_rel ()
#10 0x005de8c0 in add_base_rels_to_query ()
#11 0x005df352 in query_planner ()
#12 0x005e0d51 in grouping_planner ()
#13 0x005e2bbe in subquery_planner ()
#14 0x005e2ef9 in standard_planner ()
#15 0x006426e1 in pg_plan_query ()
#16 0x0064279e in pg_plan_queries ()
#17 0x006f4b7a in BuildCachedPlan ()
#18 0x006f4e1e in GetCachedPlan ()
#19 0x00642259 in exec_bind_message ()
#20 0x00643561 in PostgresMain ()
#21 0x0060347f in ServerLoop ()
#22 0x00604121 in PostmasterMain ()
#23 0x005a5ade in main ()

Does that indicate something? seems it's waiting for some lock.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-24 Thread Rural Hunter

2014/7/25 9:53, Tom Lane wrote:

Rural
[ shrug... ] Insufficient data.  When I try a simple test case based on
what you've told us, I get planning times of a couple of milliseconds.
I can think of contributing factors that would increase that, but not by
four orders of magnitude.  So there's something very significant that
you've left out.  Can you construct a self-contained test case that's
this slow?

regards, tom lane

No I cann't. I exported the db schema(without data) to another server 
and there is no problem. Is the planning time related to data volume? 
Anything else can I check? I already checked the default statistics 
target and it's the default value. I did change some statistics target 
on one column of the table, but the column is not involved in the slow 
planning query.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very slow planning performance on partition table

2014-07-23 Thread Rural Hunter

Hi,

I have a table partitioned with about 60 children tables.  Now I found 
the planning time of simple query with partition key are very slow.
# explain analyze select count(*) as cnt from article where pid=88 and 
hash_code='2ca3ff8b17b163f0212c2ba01b80a064';

QUERY PLAN
-
 Aggregate  (cost=16.55..16.56 rows=1 width=0) (actual 
time=0.259..0.259 rows=1 loops=1)
   -  Append  (cost=0.00..16.55 rows=2 width=0) (actual 
time=0.248..0.250 rows=1 loops=1)
 -  Seq Scan on article  (cost=0.00..0.00 rows=1 width=0) 
(actual time=0.002..0.002 rows=0 loops=1)
   Filter: ((pid = 88) AND (hash_code = 
'2ca3ff8b17b163f0212c2ba01b80a064'::bpchar))
 -  Index Scan using article_88_hash_idx on article_88 
article  (cost=0.00..16.55 rows=1 width=0) (actual time=0.246..0.248 
rows=1 loops=1)
   Index Cond: (hash_code = 
'2ca3ff8b17b163f0212c2ba01b80a064'::bpchar)

   Filter: (pid = 88)
 Total runtime: 3.816 ms
(8 rows)

Time: 30999.986 ms

You can see the timing output that the actual run time of the 'explain 
analyze' is 30 seconds while the select sql itself takes only 3 ms. My 
partition key is on article.pid and the constraint is simple like this: 
CONSTRAINT article_88_pid_check CHECK (pid = 88). What's wrong and how 
can I improve the planning performance?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-23 Thread Rural Hunter

  
  
It's the default value(partition): 
  # grep exclusion postgresql.conf
  #constraint_exclusion = partition?0?2?0?2?0?2 # on, off, or partition
  
  btw, I'm on postgresql 9.2.4
  
  ?? 2014/7/24 1:35, Douglas J Hunley :


  

  On Wed, Jul 23, 2014 at 6:21 AM,
Rural Hunter ruralhun...@gmail.com
wrote:
What's
  wrong and how can I improve the planning performance?
  
  
  What is constraint exclusion set to?
  
  
  
  -- 
  Douglas J Hunley (doug.hun...@gmail.com)

  


  




[PERFORM] How to investiage slow insert problem

2013-08-19 Thread Rural Hunter

Hi,

I'm on 9.2.4 with Ubuntu server. There are usually hundereds of 
connections doing the same insert with different data from different 
networks every minute, through pgbouncer in the same network of the 
database server. The database has been running for about one year 
without problem. Yesterday I got a problem that the connection count 
limit of the database server is reached. I checked the connections and 
found that there are many inserts hanging there. I checked the 
load(cpu,memory,io) of the db server but seems everything is fine. I 
also checked pg log and I only found there are one incomplete message 
from client error message every several minute. The I recycled 
pgbouncer and kept monitoring the connections. I found the majority of 
the inserts finish quickly but every minute there are several inserts 
left and seems hanging there . So after a while, the connection limit is 
reached again. Besides those inserts, there are no other long run 
queries and auto vacuums. I also checked the locks of the inserts and 
found they were all granted. The insert statement itself is very simple 
and it only inserts one row but there are some triggers involved. They 
might impact the performance but I have never experience any since the 
majority of the inserts are fine. The problem persisted about 1-2 hours. 
I didn't do anything except recycling pgbouncer a few times. After that 
period, everything goes back to normal. It's has been 24 hours and it 
didn't happen again.


From the error message in pg log, I supect it might be the network 
problem from some clients. Could anyone point out if there are other 
possible causes? I'm also wondering what those inserts are doing 
actually when they are hanging there, such as if they are in the trigger 
or not. Anything I can get similar with the connection snapshots in db2?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to investiage slow insert problem

2013-08-19 Thread Rural Hunter

于 2013/8/20 10:38, Sergey Konoplev 写道:

On Mon, Aug 19, 2013 at 6:44 PM, Rural Hunter ruralhun...@gmail.com wrote:
What do you mean by recycling pgbouncer? 

I mean restarting pgbouncer.
Haven't you noticed what was in the state column of the 
pg_state_activity view? In 9.2 the query column in this view shows the 
last statement that was executed in this connection, and it does not 
mean that this statement is working at the moment of monitoring. If 
the state is active, than it was working, however, my assumption is 
that it was IDLE in transaction. 

No, they are alll with 'active' state.
You mentioned the incomplete message from client error, so it might 
somehow be a network problem that led to a hunging connection to 
pgbouncer, that made pgbouncer kept a connection to postgres after 
transaction was started. 
pgbouncer and the db server are in the same local network and there 
shouldn't be any network problem between them. I also ran ping from 
pgbouncer server to the db server and there was no problem.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to investiage slow insert problem

2013-08-19 Thread Rural Hunter

  
  
 2013/8/20 12:34, Jeff Janes :

On Monday, August 19, 2013, Rural Hunter wrote:
  
  
  I think that this should generally not happen at the server
if you are using pgbouncer, as you should configure it so that
pgbouncer has a lower limit than postgresql itself does. What
pooling method (session, transaction, statement) are you using?

statement. Currently, I set the limit of pgbouncer connection to
same as db connection. But I also have a few connections connecting
to db server directly.

  
  
  Can you provide some example numbers for the io load?

I get some when the connection limit is reached(The database related
storage is on sdb/sdd/sde/sdf):
root@ubtserver:~# iostat -xm 3
Linux 3.5.0-22-generic (ubuntu)  20130819  _x86_64_ (32
CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
 14.71 0.00 2.86 0.48 0.00 81.96

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.26 0.04 0.36 0.00 0.00
24.71 0.00 0.55 3.01 0.30 0.29 0.01
sdb 0.00 0.26 0.18 2.32 0.02 0.38
329.50 0.01 5.36 1.26 5.69 0.21 0.05
sdc 0.01 4.59 10.13 45.75 0.30 0.92
44.65 0.05 5.14 7.49 4.62 0.63 3.50
dm-0 0.00 0.00 0.00 0.01 0.00
0.00 8.00 0.00 6.37 6.38 6.36 3.62 0.00
sdd 0.00 0.42 0.02 42.87 0.00 0.46
22.12 0.03 0.78 14.09 0.77 0.49 2.10
sde 0.00 3.68 10.23 156.41 0.19 1.45
20.06 0.03 1.59 21.34 0.29 0.51 8.55
sdf 0.00 2.56 6.29 66.00 0.29 0.71
28.42 0.04 0.56 4.52 0.19 0.37 2.71

avg-cpu: %user %nice %system %iowait %steal %idle
 13.99 0.00 1.91 1.04 0.00 83.06

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.33 0.00 0.00 0.00
16.00 0.00 4.00 4.00 0.00 4.00 0.13
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 15.33 5.33 14.33 0.13 0.21
34.98 0.03 1.63 6.00 0.00 1.02 2.00
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 31.33 0.00 0.26
17.19 0.01 0.34 0.00 0.34 0.34 1.07
sde 0.00 0.00 43.00 163.67 0.59 1.29
18.55 2.56 21.34 72.06 8.01 1.69 34.93
sdf 0.00 0.00 6.00 62.00 0.17 0.55
21.88 0.49 7.16 5.56 7.31 0.27 1.87

avg-cpu: %user %nice %system %iowait %steal %idle
 15.84 0.00 2.63 1.70 0.00 79.83

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 1.67 0.00 2.00 0.00 0.01
14.67 0.07 33.33 0.00 33.33 25.33 5.07
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 4.67 0.00 0.06 0.00
26.29 0.13 6.29 6.29 0.00 25.14 11.73
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.33 0.00 49.00 0.00 0.39
16.49 0.02 0.35 0.00 0.35 0.35 1.73
sde 0.00 11.00 30.67 81.33 0.38 0.71
19.98 36.46 143.19 43.91 180.62 2.69 30.13
sdf 0.00 9.33 3.00 326.00 0.09 2.75
17.69 3.51 10.66 5.33 10.71 0.11 3.60

avg-cpu: %user %nice %system %iowait %steal %idle
 14.99 0.00 2.39 4.89 0.00 77.74

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 19.67 7.33 29.00 0.09 0.60
38.61 1.18 35.41 175.45 0.00 15.93 57.87
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.33 0.00 39.33 0.00 0.31
15.93 0.01 0.37 0.00 0.37 0.37 1.47
sde 0.00 11.33 29.67 312.67 0.39 2.51
17.34 87.15 314.23 108.13 333.78 2.84 97.20
sdf 0.00 0.00 8.33 0.00 0.17 0.00
42.24 0.05 6.56 6.56 0.00 2.40 2.00

avg-cpu: %user %nice %system %iowait %steal %idle
 14.98 0.00 2.23 5.45 0.00 77.34

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.67 0.00 0.01
20.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 9.67 10.00 6.00 0.12 0.10
27.83 0.08 5.08 8.13 0.00 1.42 2.27
dm-0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 44.33 0.00 0.35
16.00 0.03 0.72 0.00 0.72 0.72 3.20
sde 0.00 0.00 47.33 0.00 0.58 0.00
25.18 5.26 111.04 111.04 0.00 19.10 90.40
sdf 0.00 11.00 3.33 683.33 0.12 7.38
22.37 12.05 17.54 244.00 16.44 0.49 33.33

avg-cpu: %user %nice %system %iowait %steal %idle
 15.21 0.00 2.54 0.56 0.00 81.69

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.00 0.00 1.00 0.00 0.01
24.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00

Re: [PERFORM] Geoserver-PostGIS performance problems

2012-07-25 Thread Rural Hunter
Why not just use simple Statement instead of PreparedStatement and 
construct the SQL with concated string or StringBuilder? like this:

int col1=xxx;
String col2=;
String sql=select * from table where col1=+col+ and col2='+col2+';

于 2012/7/26 3:59, Vinicius Abrahao 写道:

On Wed, Jul 25, 2012 at 4:26 PM, Merlin Moncure mmonc...@gmail.com wrote:

On Wed, Jul 25, 2012 at 2:17 PM, Vinicius Abrahao vinnix@gmail.com wrote:

On Wed, Jul 25, 2012 at 3:45 PM, Merlin Moncure mmonc...@gmail.com wrote:

Note that it seems the preparing/planning interaction was not the
poster's actual problem, but it may have been yours. As Tom Lane notes
in that thread, this should get better in 9.2.

jdbc should get some blame too -- it's really aggressive about
preparing queries.


indeed!
Is there any reason for that?

IMNSHO it's an oversight in the core JDBC design dating back to the
beginning: you have two basic choices for executing SQL.  The
unparameterized Statement or the parameterized PreparedStatement.
There should have been a 'ParamaterizedStatement' that gave the
expectation of paramaterization without setting up and permanent
server side structures to handle the query; libpq makes this
distinction and it works very well.  Of course, there are various ways
to work around this but the point stands.


That is true, I was observing the same, days ago:

Running queries and statments in jdbc:
https://github.com/vinnix/JavaLab/blob/master/Scrollable.java

And running queries with libpq:
https://github.com/vinnix/testLibPQ/blob/master/testlibpq.c

Is this possible to change something (I really don't know what or
where) in the jdbc driver
to get more direct aproach? (if that's make any sense to you guys...)

Best regards,

vinnix




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to change the index chosen in plan?

2012-06-11 Thread Rural Hunter

于 2012/6/11 20:07, Kevin Grittner 写道:

Rural Hunter  wrote:

于 2012/6/9 22:39, Kevin Grittner 写道:
  

You neglected to mention the LIMIT clause in your earlier
presentation of the problem. A LIMIT can have a big impact on plan
choice. Is the LIMIT 10 part of the actual query you want to
optimize? Either way it would be helpful to see the EXPLAIN
ANALYZE output for the the query without the LIMIT clause.

Yes, sorry for that. I do need the limit clause in the query to
show only part of the results to the user(common multi-pages view).
Without the limit clause, I got the plan as I wanted:
http://explain.depesz.com/s/Qdu

So looks either I remove the order-by or limit clause, I can get
what I wanted. But I do need the both in the query...
  
Well, we're still doing diagnostic steps.  What this one shows is

that your statistics are leading the planner to believe that there
will be 20846 rows with lid = 3072, while there are really only 62.
If it knew the actual number I doubt it would choose the slower plan.
  
The next thing I would try is:
  
ALTER TABLE article_label ALTER COLUMN lid SET STATISTICS = 5000;

ANALYZE article_label;
  
Then try the query without LIMIT and see if you get something on the

right order of magnitude comparing the estimated rows to actual on
that index scan.  You can try different STATISTICS values until you
get the lowest value that puts the estimate in the right
neighborhood.  Higher settings will increase plan time; lower
settings may lead to bad plans.
  
Once you've got a decent estimate, try with the ORDER BY and LIMIT

again.
I set statistics to 5000 and got estimated row count 559. Set statistics 
to 8000 and got estimated row count 393.  At this step, I run the query 
with both order-by and limit clause and got the expected result.
Kevin, Thank you very much for your patience and step-by-step guidance! 
I learnt a lot from this case!
  
If you have a hard time getting a good estimate even with a high

statistics target, you should investigate whether you have extreme
table bloat.
  
-Kevin






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to change the index chosen in plan?

2012-06-10 Thread Rural Hunter

于 2012/6/9 22:39, Kevin Grittner 写道:

Rural Hunter  wrote:

于 2012/6/9 0:39, Kevin Grittner 写道:
  

name | current_setting
  

full_page_writes | off
  
There may be exceptions on some file systems, but generally turning

this off leaves you vulnerable to possible database corruption if you
OS or hardware crashes.
Yes, I understand. My situation is, the io utiliztion of my system is 
quite high so I turned this off to reduce the io utilization. We have a 
replication server to serve as the hot standby if there is any issue on 
the primary. So currently I think it's acceptable option to me.
  

max_connections | 2500
  
Yikes!  You may want to look in to a connection pooler which can take

2500 client connections and funnel them into a much smaller number of
database connections.
  
https://wiki.postgresql.org/wiki/Number_Of_Database_Connections
  

shared_buffers | 60GB
  
You might want to compare your performance with this setting against

a smaller setting.  Many benchmarks have shown settings about a
certain point (like 8MB to 12 MB) to be counter-productive, although
a few have shown increased performance going past that.  It really
seems to depend on your hardware and workload, so you have to test to
find the sweet spot for your environment.
  

work_mem | 8MB
  
With so many connections, I can understand being this low.  One of

the advantages of using connection pooling to funnel your user
connections into fewer database conncections is that you can boost
this, which might help considerably with some types of queries.
  
None of the above, however, really gets to your immediate problem.

What is most significant about your settings with regard to the
problem query is what's *not* in that list.  You appear to have a
heavily cached active data set, based on the row counts and timings
in EXPLAIN ANALYZE output, and you have not adjusted your cost
factors, which assume less caching.
Thanks for the advices. As of now we don't see overall performance issue 
on the db. I will adjust these settings based on your advices if we 
begin to see overall performance degrade.
  
Try setting these on a connection and then running your queries on

that connection.
  
set seq_page_cost = 0.1;

set random_page_cost = 0.1;
set cpu_tuple_cost = 0.03;
I tried these settings but don't see noticeable improvement. The plan is 
not changed.
  

Ok, I get out a simple version of the actualy query. Here is the
explain anaylze without order-by, which is I wanted:
http://explain.depesz.com/s/p1p

Another with the order-by which I want to avoid:
http://explain.depesz.com/s/ujU
  
You neglected to mention the LIMIT clause in your earlier

presentation of the problem.  A LIMIT can have a big impact on plan
choice.  Is the LIMIT 10 part of the actual query you want to
optimize?  Either way it would be helpful to see the EXPLAIN ANALYZE
output for the the query without the LIMIT clause.
Yes, sorry for that. I do need the limit clause in the query to show 
only part of the results to the user(common multi-pages view). Without 
the limit clause, I got the plan as I wanted:

http://explain.depesz.com/s/Qdu

So looks either I remove the order-by or limit clause, I can get what I 
wanted. But I do need the both in the query...


  
-Kevin






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how to change the index chosen in plan?

2012-06-08 Thread Rural Hunter

I have a query like this:
select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44 
and b.bid=8
postgresql selected the index on a.col1 then selected the index on 
b.bid. But in my situation, I know that the query will be faster if it 
chose the index on b.bid first since there are only a few rows with 
value 8. So I re-wrote the query as below:
select a.* from a where a.aid in (select aid from b where bid=8) and 
a.col1=33 a.col2=44
But surprisingly, postgresql didn't change the plan. it still chose to 
index scan on a.col1. How can I re-wirte the query so postgresql will 
scan on b.bid first?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to change the index chosen in plan?

2012-06-08 Thread Rural Hunter
No, it's not the analyze problem. For some other values on b.bid such 
as 9, 10, the plan is fine since there a a lot of rows in table b for 
them. But for some specific values such as 8 I want the plan changed.


于2012年6月8日 22:10:58,Tom Lane写到:

Rural Hunter ruralhun...@gmail.com writes:

I have a query like this:
select a.* from a inner join b on a.aid=b.aid where a.col1=33 a.col2=44
and b.bid=8
postgresql selected the index on a.col1 then selected the index on
b.bid. But in my situation, I know that the query will be faster if it
chose the index on b.bid first since there are only a few rows with
value 8.


If you know that and the planner doesn't, maybe ANALYZE is called for.

regards, tom lane





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to change the index chosen in plan?

2012-06-08 Thread Rural Hunter

Hi Kevin,
Thanks for your detailed explanation.

于 2012/6/8 22:37, Kevin Grittner 写道:

Rural Hunter ruralhun...@gmail.com wrote:

于2012年6月8日 22:10:58,Tom Lane写到:

Rural Hunter ruralhun...@gmail.com writes:

I have a query like this:
select a.* from a inner join b on a.aid=b.aid where a.col1=33
a.col2=44 and b.bid=8
postgresql selected the index on a.col1 then selected the index
on b.bid. But in my situation, I know that the query will be
faster if it chose the index on b.bid first since there are only
a few rows with value 8.

If you know that and the planner doesn't, maybe ANALYZE is called
for.


No, it's not the analyze problem.
  
So you ran ANALYZE and retried?  If not, please do.

Yes, I did.
  

For some other values on b.bid such as 9, 10, the plan is fine
since there a a lot of rows in table b for them.
  
So it uses the same plan regardless of the number of rows in table b

for the value?

yes.

That sure *sounds* like you need to run ANALYZE,
possibly after adjusting the statistics target for a column or two.

 How can adjust the statistics target?
  

But for some specific values such as 8 I want the plan changed.
  
If you approach it from that line of thought, you will be unlikely

to reach a good long-term solution.  PostgreSQL has a costing model
to determine which plan is expected to be cheapest (fastest).  This
is based on statistics gathered during ANALYZE and on costing
factors.  Generally, if it's not choosing the fastest plan, you
aren't running ANALYZE frequently enough or with a fine-grained
enough statistics target _or_ you need to adjust your costing
factors to better model your actual costs.
  
You haven't given us a lot of clues about which it is that you need

to do, but there is *some* suggestion that you need to ANALYZE.  If
you *try* that and it doesn't solve your problem, please read this
page and provide more information:
  
http://wiki.postgresql.org/wiki/SlowQueryQuestions
Sorry the actual tables and query are very complicated so I just 
simplified the problem with my understanding. I rechecked the query and 
found it should be simplified like this:
select a.* from a inner join b on a.aid=b.aid where a.col1=33 and 
a.col2=44 and a.timenow() and b.bid=8 order by a.time limit 10
There is an index on (a.col1,a.col2,a.time). If I remove the order-by 
clause, I can get the plan as I expected. I think that's why postgresql 
selected that index. But still I want the index on b.bid selected first 
for value 8 since there are only several rows with bid 8. though for 
other normal values there might be several kilo to million rows.
  
-Kevin






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to change the index chosen in plan?

2012-06-08 Thread Rural Hunter

于 2012/6/9 0:39, Kevin Grittner 写道:

Rural Hunter ruralhun...@gmail.com wrote:
  

   How can adjust the statistics target?
  
default_statistics_target
  
http://www.postgresql.org/docs/current/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER
  
or ALTER TABLE x ALTER COLUMN y SET STATISTICS n
  
http://www.postgresql.org/docs/current/interactive/sql-altertable.html

Thanks, I will check detail.
  

Sorry the actual tables and query are very complicated so I just
simplified the problem with my understanding. I rechecked the
query and found it should be simplified like this:
select a.* from a inner join b on a.aid=b.aid where a.col1=33 and
a.col2=44 and a.timenow() and b.bid=8 order by a.time limit 10
There is an index on (a.col1,a.col2,a.time). If I remove the
order-by clause, I can get the plan as I expected. I think that's
why postgresql selected that index.
  
Sounds like it expects the sort to be expensive, which means it

probably expects a large number of rows.  An EXPLAIN ANALYZE of the
query with and without the ORDER BY might be instructive.  It would
also help to know what version of PostgreSQL you have and how it is
configured, all of which shows up in the results of the query on
this page:
  
http://wiki.postgresql.org/wiki/Server_Configuration
  

Here is the output:
name | current_setting
-+---
version | PostgreSQL 9.1.3 on x86_64-unknown-linux-gnu, compiled by gcc 
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
archive_command | test ! -f /dbbk/postgres/logarch/%f.gz  gzip -c %p 
/dbbk/postgres/logarch/%f.gz

archive_mode | on
autovacuum | on
autovacuum_freeze_max_age | 20
checkpoint_segments | 20
client_encoding | UTF8
effective_cache_size | 150GB
full_page_writes | off
lc_collate | zh_CN.utf8
lc_ctype | zh_CN.utf8
listen_addresses | *
log_autovacuum_min_duration | 30min
log_destination | stderr
log_line_prefix | %t [%u@%h]
log_min_duration_statement | 10s
log_statement | ddl
logging_collector | on
maintenance_work_mem | 10GB
max_connections | 2500
max_stack_depth | 2MB
max_wal_senders | 1
port | 3500
server_encoding | UTF8
shared_buffers | 60GB
synchronous_commit | off
TimeZone | PRC
track_activities | on
track_counts | on
vacuum_freeze_table_age | 10
wal_buffers | 16MB
wal_level | hot_standby
work_mem | 8MB
(33 rows)


But still I want the index on b.bid selected first
for value 8 since there are only several rows with bid 8. though
for other normal values there might be several kilo to million
rows.
  
An EXPLAIN ANALYZE of one where you think the plan is a good choice

might also help.
Ok, I get out a simple version of the actualy query. Here is the explain 
anaylze without order-by, which is I wanted:

http://explain.depesz.com/s/p1p

Another with the order-by which I want to avoid:
http://explain.depesz.com/s/ujU

This is the count of rows in article_label with value 3072(which I 
referred as table b in previous mail):

# select count(*) from article_label where lid=3072;
count
---
56
(1 row)

  
Oh, and just to be sure -- are you actually running queries with the

literals like you show, or are you using prepared statements with
placeholders and plugging the values in after the statement is
prepared?  Sample code, if possible, might help point to or
eliminate issues with a cached plan.  If you're running through a
cached plan, there is no way for it to behave differently based on
the value plugged into the query -- the plan has already been set
before you get to that point.

Yes, I ran the query directly wih psql.
  
-Kevin






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query got slow from 9.0 to 9.1 upgrade

2012-05-03 Thread Rural Hunter
This is very similar with my problem: 
http://postgresql.1045698.n5.nabble.com/index-choosing-problem-td5567320.html


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index choosing problem

2012-03-15 Thread Rural Hunter
I have a table with serveral million records. they are divided into 
about one hundred catagory(column cid). I created index includes the cid 
as the first column. I had a problem with some cids they only have few 
records comparing with other cids. Some of them only have serveral 
thousand rows. Some queries are not using index on the cids. I got the 
explain for the queries.

Note:
article_others_cid_time_style_idx is the index contains cid as the first 
column

article_others_pkey is the primary key on an auto incremented column aid.

# select count(*) from article_others;
  count
-
 6888459
(1 row)

# select count(*) from article_others where cid=74;
 count
---
  4199
(1 row)

1. # explain select count(*) from article_others where cid=74;
  QUERY PLAN
--
 Aggregate  (cost=32941.95..32941.96 rows=1 width=0)
   -  Index Scan using article_others_cid_time_style_idx on 
article_others  (cost=0.00..32909.34 rows=13047 width=0)

 Index Cond: (cid = 74)
(3 rows)

2. # explain select aid from article_others where cid=74 limit 10;
  QUERY PLAN
--
 Limit  (cost=0.00..25.22 rows=10 width=8)
   -  Index Scan using article_others_cid_time_style_idx on 
article_others  (cost=0.00..32909.34 rows=13047 width=8)

 Index Cond: (cid = 74)
(3 rows)

3. # explain select aid from article_others where cid=74 order by aid 
desc limit 10;

QUERY PLAN
---
 Limit  (cost=0.00..1034.00 rows=10 width=8)
   -  Index Scan Backward using article_others_pkey on article_others  
(cost=0.00..1349056.65 rows=13047 width=8)

 Filter: (cid = 74)
(3 rows)

4. # explain select aid from article_others where cid=74 order by aid 
desc limit 1;

QUERY PLAN
---
 Limit  (cost=0.00..103.40 rows=1 width=8)
   -  Index Scan Backward using article_others_pkey on article_others  
(cost=0.00..1349060.65 rows=13047 width=8)

 Filter: (cid = 74)
(3 rows)

5. # explain select max(aid) from article_others where cid=74;
QUERY PLAN
---
 Result  (cost=104.70..104.71 rows=1 width=0)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..104.70 rows=1 width=8)
   -  Index Scan Backward using article_others_pkey on 
article_others  (cost=0.00..1365988.55 rows=13047 width=8)

 Index Cond: (aid IS NOT NULL)
 Filter: (cid = 74)
(6 rows)

Now the query 3-5 using article_others_pkey are quite slow. The rows for 
cid 74 are very old and seldom get updated. I think pg needs to scan 
quite a lot on article_others_pkey before it gets the rows for cid 74. 
The same query for other cids with new and majority of rows runs very 
fast. for example:

# explain select max(aid) from article_others where cid=258;
 QUERY PLAN

 Result  (cost=1.54..1.55 rows=1 width=0)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..1.54 rows=1 width=8)
   -  Index Scan Backward using article_others_pkey on 
article_others  (cost=0.00..1366260.55 rows=889520 width=8)

 Index Cond: (aid IS NOT NULL)
 Filter: (cid = 258)

So I think if pg chooses to use index article_others_cid_time_style_idx 
the performance would be much better. or any other solution I can take 
to improve the query performance for those cids like 74?
Another question, why the plan shows rows=13047 for cid=74 while 
actually it only has 4199 rows? There is almost no data changes for cid 
74 and I just vacuum/analyzed the table this morning.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Gin index insert performance issue

2012-03-12 Thread Rural Hunter
I'm using gin index for my full text search engine in production. These 
days the gin index size grows to 20-30G and the system started to suffer 
with periodical insert hanging. This is same as described in the 2 posts:

http://postgresql.1045698.n5.nabble.com/Random-penalties-on-GIN-index-updates-td2073848.html
http://postgresql.1045698.n5.nabble.com/Periodically-slow-inserts-td3230434.html

The gin index is on a dedicated raid 10 SAS disk and the performance 
should be enough for normal db operation. But I always see almost 100% 
disk utiliztion on the disk when the inserts hang. The utiliztion for 
other data(such as the full text table data) on another disk(same setup 
as the gin index disk: SAS raid 10) is quite low comparing with the gin 
index disk. From my observation, looks too much data is written to the 
disk when the pending list of gin index is flushed to the disk. Below is 
the outupt of 'iostat -xm 3' on the disk when inserts hang:
Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sde   0.00 0.000.67 2614.00 0.0822.94
18.0332.94   12.61   0.38 100.00
sde   0.00 0.001.67 2377.33 0.1720.43
17.7332.00   13.44   0.42 100.00
sde   0.00 0.00   15.67 2320.33 0.2320.13
17.8531.99   13.73   0.43 100.00
sde   0.00 0.007.33 1525.00 0.1214.02
18.9032.00   20.83   0.65 100.00
sde   0.00 0.00   14.33 1664.67 0.1215.54
19.1032.00   19.06   0.60 100.00
sde   0.00 0.005.33 1654.33 0.0412.07
14.9432.00   19.22   0.60 100.00


I tried to increase work_mem but the inserts hang more time each time 
with less frequency.  So it makes almost no difference for the total 
hanging time. Frequent vacuum is not a choice since the hang happens 
very 3-5 mins. is there any improvement I can make with pg for such data 
volumn(still increasing) or it's time to turn to other full text search 
solution such as lucene etc?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Rural Hunter
for example, the where condition is: where '' ~ col1. I created a 
normal index on col1 but seems it is not used.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Rural Hunter
actually I stored the pattern in col1. I want to get the row whose col1 
pattern matches one string 'aaa'.


于2011年12月15日 4:43:37,Marti Raudsepp写到:

2011/12/14 Rural Hunterruralhun...@gmail.com:

for example, the where condition is: where '' ~ col1. I created a normal
index on col1 but seems it is not used.


I assume you want to search values that match one particular pattern,
that would be col1 ~ ''

The answer is, only very simple patterns that start with '^'. Note
that you MUST use the text_pattern_ops index opclass:

# create table words (word text);
# copy words from '/usr/share/dict/words';
# create index on words (word text_pattern_ops);
# explain select * from words where word ~ '^post';
Index Scan using words_word_idx on words  (cost=0.00..8.28 rows=10 width=9)
   Index Cond: ((word ~=~ 'post'::text) AND (word ~~ 'posu'::text))
   Filter: (word ~ '^post'::text)



If you just want to search for arbitrary strings, in PostgreSQL 9.1+
you can use pg_trgm extension with a LIKE expression:

# create extension pg_trgm;
# create index on words using gist (word gist_trgm_ops);
# explain select * from words where word like '%post%';
Bitmap Heap Scan on words  (cost=4.36..40.23 rows=10 width=9)
   Recheck Cond: (word ~~ '%post%'::text)
   -   Bitmap Index Scan on words_word_idx1  (cost=0.00..4.36 rows=10 width=0)
 Index Cond: (word ~~ '%post%'::text)



There's also the wildspeed external module which is somewhat faster
at this: http://www.sai.msu.su/~megera/wiki/wildspeed

And someone is working to get pg_trgm support for arbitrary regular
expression searches. This *may* become part of the next major
PostgreSQL release (9.2)
http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com

Regards,
Marti





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance