Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On 28/12/13 18:19, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? It is a bit tricky to tell without any relevant information (e.g schema description). But a likely culprit would be a missing index on the relevant 'tenant_id' type field in each table that you are using to distinguish the various tenant datasets. Regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
I am using multiple databases in my app for multi tenacy environment. In my app user login with tenant id and connect to respective database. The scenario is when 5 users connect to 5 databases and perform some activity in that case site performance going down and some time database server become unresponsive. In each database we have same schema with different data regarding to users of that tenant. Any idea? On Sat, Dec 28, 2013 at 3:24 PM, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 28/12/13 18:19, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? It is a bit tricky to tell without any relevant information (e.g schema description). But a likely culprit would be a missing index on the relevant 'tenant_id' type field in each table that you are using to distinguish the various tenant datasets. Regards Mark -- -- Regards @Ankush Upadhyay@
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On 12/28/2013 12:19 AM, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? I usually use the term multi-tenancy to refer to different postgres instances running on the same machine, rather than different databases within a single instance of postgres. So lease describe your setup in more detail. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net wrote: On 12/28/2013 12:19 AM, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? I usually use the term multi-tenancy to refer to different postgres instances running on the same machine, rather than different databases within a single instance of postgres. So lease describe your setup in more detail. cheers andrew First of all Thanks Andrew for let me know email etiquette and extremely sorry for confusion. Here I meant to say that different postgres instances running on the same machine. Actually I have one application machine and one database server machine with multiple postgres instances running on it and accessing by application server. I hope this time I could explain it in more details. -- -- Regards @Ankush Upadhyay@
Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system
On 12/28/2013 08:46 AM, ankush upadhyay wrote: On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net mailto:and...@dunslane.net wrote: On 12/28/2013 12:19 AM, ankush upadhyay wrote: Hello All, I am using multi tenant system and doing performance testing of multi tenant application. In case of single tenant it is working fine but once I enable tenants, then some time database servers not responding. Any clue? I usually use the term multi-tenancy to refer to different postgres instances running on the same machine, rather than different databases within a single instance of postgres. So lease describe your setup in more detail. cheers andrew First of all Thanks Andrew for let me know email etiquette and extremely sorry for confusion. Here I meant to say that different postgres instances running on the same machine. Actually I have one application machine and one database server machine with multiple postgres instances running on it and accessing by application server. I hope this time I could explain it in more details. Why are you doing that, as opposed to running multiple databases in a single instance? Running more than a handful of instances in a single machine is almost always a recipe for poor performance. The vast majority of users in my experience run a single postgres instance per machine, possibly with a large number of databases. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Michael Kolomeitsev mkolomeit...@gmail.com wrote: it is clear for me why t1_b_a_idx is better. The question is: Is postgresql able to see that? For a number of reasons I never consider a bulk load complete until I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try your test case without that, I get the bad index choice. When I then run VACUUM FREEZE ANALYZE on the database I get the good index choice. There may be some lesser maintenance which sets up visibility information and provides the planner with enough data to make a good choice, I just noticed that you were not following what I consider to be rote good practice, tried it, and it solved the problem. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
On 29/12/13 10:03, Kevin Grittner wrote: Michael Kolomeitsev mkolomeit...@gmail.com wrote: it is clear for me why t1_b_a_idx is better. The question is: Is postgresql able to see that? For a number of reasons I never consider a bulk load complete until I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try your test case without that, I get the bad index choice. When I then run VACUUM FREEZE ANALYZE on the database I get the good index choice. There may be some lesser maintenance which sets up visibility information and provides the planner with enough data to make a good choice, I just noticed that you were not following what I consider to be rote good practice, tried it, and it solved the problem. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Curious: Would it be feasible to do some kind of ANALYZE during a bulk operation? Say if you could tell the system you expected to change 20% of the records in advance: then you could sample some of the changes and modify the statistics with 0.2 times that plus 0.8 of the pre-existing statistics. BEGIN BULK OPERATION CHANGE 20% [... several transactions ...] END BULK OPERATION The sampling could be done as part of the individual operations or at the end of the bulk operation - whichever is deemed more practicable (possibly a bit of both?). Cheers, Gavin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Kevin Grittner kgri...@ymail.com writes: Michael Kolomeitsev mkolomeit...@gmail.com wrote: it is clear for me why t1_b_a_idx is better. The question is: Is postgresql able to see that? For a number of reasons I never consider a bulk load complete until I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try your test case without that, I get the bad index choice. When I then run VACUUM FREEZE ANALYZE on the database I get the good index choice. I think that's just chance, because AFAICS the cost estimates are exactly the same for both indexes, once you've done the vacuum to make all the heap pages all-visible. What's more, I'm not sure that that's wrong, because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of index pages are touched for either index. So I think Michael's claim that the one index is better is at best unproven. regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 33; QUERY PLAN --- Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1) Buffers: shared hit=30 - Index Only Scan using t1_b_a_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7 loops=1) Index Cond: ((b = 33) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[]))) Heap Fetches: 0 Buffers: shared hit=30 Total runtime: 0.174 ms (7 rows) regression=# begin; drop index t1_b_a_idx; BEGIN DROP INDEX regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 33; QUERY PLAN --- Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1) Buffers: shared hit=30 - Index Only Scan using t1_a_b_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7 loops=1) Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 33)) Heap Fetches: 0 Buffers: shared hit=30 Total runtime: 0.199 ms (7 rows) regression=# abort; ROLLBACK I grant the theory that the repeated index probes in t1_b_a_idx should be more localized than those in t1_a_b_idx, but PG's optimizer doesn't attempt to estimate such effects, and this example isn't doing much to convince me that it'd be worth the trouble. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
2013/12/29 Tom Lane t...@sss.pgh.pa.us I think that's just chance, because AFAICS the cost estimates are exactly the same for both indexes, once you've done the vacuum to make all the heap pages all-visible. What's more, I'm not sure that that's wrong, because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of index pages are touched for either index. So I think Michael's claim that the one index is better is at best unproven. Let me prove :) 1. I do benchmarking after dropping Pg and OS disk caches/buffers. In a way I posted in my first message: sh# systemctl stop postgresql.service ; echo 3 /proc/sys/vm/drop_caches ; systemctl start postgresql.service And timing results are quite stable: 200-210ms using t1_a_b_idx and 90-100ms using t1_b_a_idx. Trying 'explain (analyze, buffers) ... ' I got this: * using t1_a_b_idx: QUERY PLAN -- Aggregate (cost=46.62..46.63 rows=1 width=0) (actual time=228.853..228.854 rows=1 loops=1) Buffers: shared hit=12 read=23 - Index Only Scan using t1_a_b_idx on t1 (cost=0.57..46.60 rows=7 width=0) (actual time=52.171..228.816 rows=7 loops=1) Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 33)) Heap Fetches: 7 Buffers: shared hit=12 read=23 Total runtime: 229.012 ms * using t1_b_a_idx: QUERY PLAN -- Aggregate (cost=60.12..60.13 rows=1 width=0) (actual time=115.617..115.617 rows=1 loops=1) Buffers: shared hit=24 read=11 - Index Only Scan using t1_b_a_idx on t1 (cost=0.57..60.10 rows=7 width=0) (actual time=80.460..115.590 rows=7 loops=1) Index Cond: ((b = 33) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[]))) Heap Fetches: 7 Buffers: shared hit=24 read=11 Total runtime: 116.480 ms There is a difference in read operations and moreover in cost estimation. 23 - 11 = 12 excess read operations. If they are all random reads they may take ~100ms on typical home/workstation SATA hard drive. That's the difference between timings I showed above. Yes, I understand that Pg doesn't know (while planning the query) how many pages will be hitted in shared buffers. But I can't get why there is the same buffers count (35) in both plans... And I can't get why I have different cost estimations... I grant the theory that the repeated index probes in t1_b_a_idx should be more localized than those in t1_a_b_idx, but PG's optimizer doesn't Yes, I see t1_a_b_idx and t1_b_a_idx have 3 levels in btree. For t1_a_b_idx Pg have to read 1 (header) + 1 (root) + 1 (common level 1 node) + 7 * 2 = 17 pages in it and for t1_b_a_idx 1 + 1 + 3 = 5 pages ('cause all 7 pairs of (a, b) are located in one btree leaf node). 17 - 5 = 12 - this is the same difference as we can see in 'explain (analyze, buffers)'. attempt to estimate such effects, and this example isn't doing much to convince me that it'd be worth the trouble. In a real life situation I have two kinds of queries for that table: * select ... from t1 where a in (...) and b = ? * select ... from t1 where a = ? and b in (...) I select fields from t1 that are not in indexes thus there is no 'Index Only Scan', more random reads and performance impact of choosing t1_a_b_idx in both queries is a bit lesser. And I got the answer (PG's optimizer doesn't attempt to estimate such effects) for my situation. Thanks a lot.