Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Mark Kirkwood

On 28/12/13 18:19, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing of multi
tenant application. In case of single tenant it is working fine but once
I enable tenants, then some time database servers not responding. Any clue?



It is a bit tricky to tell without any relevant information (e.g schema 
description). But a likely culprit would be a missing index on the 
relevant 'tenant_id' type field in each table that you are using to 
distinguish the various tenant datasets.


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread ankush upadhyay
I am using multiple databases in my app for multi tenacy environment.

In my app user login with tenant id and connect to respective database.

The scenario is when 5 users connect to 5 databases and perform some
activity in that case site performance going down and some time database
server become unresponsive.

In each database we have same schema with different data regarding to users
of that tenant.
Any idea?


On Sat, Dec 28, 2013 at 3:24 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 28/12/13 18:19, ankush upadhyay wrote:

 Hello All,

 I am using multi tenant system and doing performance testing of multi
 tenant application. In case of single tenant it is working fine but once
 I enable tenants, then some time database servers not responding. Any
 clue?


 It is a bit tricky to tell without any relevant information (e.g schema
 description). But a likely culprit would be a missing index on the relevant
 'tenant_id' type field in each table that you are using to distinguish the
 various tenant datasets.

 Regards

 Mark




-- 
--
Regards
@Ankush Upadhyay@


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Andrew Dunstan


On 12/28/2013 12:19 AM, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing of multi 
tenant application. In case of single tenant it is working fine but 
once I enable tenants, then some time database servers not responding. 
Any clue?







I usually use the term multi-tenancy to refer to different postgres 
instances running on the same machine, rather than different databases 
within a single instance of postgres. So lease describe your setup in 
more detail.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread ankush upadhyay
On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 12/28/2013 12:19 AM, ankush upadhyay wrote:

 Hello All,

 I am using multi tenant system and doing performance testing of multi
 tenant application. In case of single tenant it is working fine but once I
 enable tenants, then some time database servers not responding. Any clue?





 I usually use the term multi-tenancy to refer to different postgres
 instances running on the same machine, rather than different databases
 within a single instance of postgres. So lease describe your setup in more
 detail.

 cheers

 andrew



First of all Thanks Andrew for let me know email etiquette and extremely
sorry for confusion.

Here I meant to say that  different postgres instances running on the same
machine.

Actually I have one application machine and one database server machine
with multiple postgres instances running on it and accessing by application
server.

I hope this time I could explain it in more details.

-- 
--
Regards
@Ankush Upadhyay@


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Andrew Dunstan


On 12/28/2013 08:46 AM, ankush upadhyay wrote:
On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:



On 12/28/2013 12:19 AM, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing
of multi tenant application. In case of single tenant it is
working fine but once I enable tenants, then some time
database servers not responding. Any clue?





I usually use the term multi-tenancy to refer to different
postgres instances running on the same machine, rather than
different databases within a single instance of postgres. So lease
describe your setup in more detail.

cheers

andrew



First of all Thanks Andrew for let me know email etiquette and 
extremely sorry for confusion.


Here I meant to say that  different postgres instances running on the 
same machine.


Actually I have one application machine and one database server 
machine with multiple postgres instances running on it and accessing 
by application server.


I hope this time I could explain it in more details.





Why are you doing that, as opposed to running multiple databases in a 
single instance? Running more than a handful of instances in a single 
machine is almost always a recipe for poor performance. The vast 
majority of users in my experience run a single postgres instance per 
machine, possibly with a large number of databases.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

2013-12-28 Thread Kevin Grittner
Michael Kolomeitsev mkolomeit...@gmail.com wrote:

 it is clear for me why t1_b_a_idx is better. The question is: Is
 postgresql able to see that?

For a number of reasons I never consider a bulk load complete until
I run VACUUM FREEZE ANALYZE on the table(s) involved.  When I try
your test case without that, I get the bad index choice.  When I
then run VACUUM FREEZE ANALYZE on the database I get the good index
choice.

There may be some lesser maintenance which sets up visibility
information and provides the planner with enough data to make a
good choice, I just noticed that you were not following what I
consider to be rote good practice, tried it, and it solved the
problem.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

2013-12-28 Thread Gavin Flower

On 29/12/13 10:03, Kevin Grittner wrote:

Michael Kolomeitsev mkolomeit...@gmail.com wrote:


it is clear for me why t1_b_a_idx is better. The question is: Is
postgresql able to see that?

For a number of reasons I never consider a bulk load complete until
I run VACUUM FREEZE ANALYZE on the table(s) involved.  When I try
your test case without that, I get the bad index choice.  When I
then run VACUUM FREEZE ANALYZE on the database I get the good index
choice.

There may be some lesser maintenance which sets up visibility
information and provides the planner with enough data to make a
good choice, I just noticed that you were not following what I
consider to be rote good practice, tried it, and it solved the
problem.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Curious: Would it be feasible to do some kind of ANALYZE during a bulk 
operation?  Say if you could tell the system you expected to change 20% 
of the records in advance: then you could sample some of the changes and 
modify the statistics with 0.2 times that plus 0.8 of the pre-existing 
statistics.


BEGIN BULK OPERATION CHANGE 20%
[... several transactions ...]
END BULK OPERATION

The sampling could be done as part of the individual operations or at 
the end of the bulk operation - whichever is deemed more practicable 
(possibly a bit of both?).



Cheers,
Gavin


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

2013-12-28 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Michael Kolomeitsev mkolomeit...@gmail.com wrote:
 it is clear for me why t1_b_a_idx is better. The question is: Is
 postgresql able to see that?

 For a number of reasons I never consider a bulk load complete until
 I run VACUUM FREEZE ANALYZE on the table(s) involved.  When I try
 your test case without that, I get the bad index choice.  When I
 then run VACUUM FREEZE ANALYZE on the database I get the good index
 choice.

I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible.  What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index.  So I think Michael's claim that
the one index is better is at best unproven.

regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 
9, 17, 26, 35, 41, 50) and b = 33;
QUERY PLAN  
   
---
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 
loops=1)
   Buffers: shared hit=30
   -  Index Only Scan using t1_b_a_idx on t1  (cost=0.57..32.10 rows=7 
width=0) (actual time=0.044..0.085 rows=7 loops=1)
 Index Cond: ((b = 33) AND (a = ANY 
('{1,9,17,26,35,41,50}'::integer[])))
 Heap Fetches: 0
 Buffers: shared hit=30
 Total runtime: 0.174 ms
(7 rows)

regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 
9, 17, 26, 35, 41, 50) and b = 33;
QUERY PLAN  
   
---
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 
loops=1)
   Buffers: shared hit=30
   -  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..32.10 rows=7 
width=0) (actual time=0.039..0.101 rows=7 loops=1)
 Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 
33))
 Heap Fetches: 0
 Buffers: shared hit=30
 Total runtime: 0.199 ms
(7 rows)

regression=# abort;
ROLLBACK

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

2013-12-28 Thread Michael Kolomeitsev
2013/12/29 Tom Lane t...@sss.pgh.pa.us

 I think that's just chance, because AFAICS the cost estimates are exactly
 the same for both indexes, once you've done the vacuum to make all the
 heap pages all-visible.  What's more, I'm not sure that that's wrong,
 because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
 index pages are touched for either index.  So I think Michael's claim that
 the one index is better is at best unproven.


Let me prove :)

1. I do benchmarking after dropping Pg and OS disk caches/buffers.
In a way I posted in my first message:
sh# systemctl stop postgresql.service ; echo 3 /proc/sys/vm/drop_caches ;
systemctl start postgresql.service

And timing results are quite stable: 200-210ms using t1_a_b_idx and
90-100ms using t1_b_a_idx.

Trying 'explain (analyze, buffers) ... ' I got this:
* using t1_a_b_idx:
  QUERY PLAN
--
 Aggregate  (cost=46.62..46.63 rows=1 width=0) (actual
time=228.853..228.854 rows=1 loops=1)
   Buffers: shared hit=12 read=23
   -  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..46.60 rows=7
width=0) (actual time=52.171..228.816 rows=7 loops=1)
 Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b
= 33))
 Heap Fetches: 7
 Buffers: shared hit=12 read=23
 Total runtime: 229.012 ms


* using t1_b_a_idx:
  QUERY PLAN
--
 Aggregate  (cost=60.12..60.13 rows=1 width=0) (actual
time=115.617..115.617 rows=1 loops=1)
   Buffers: shared hit=24 read=11
   -  Index Only Scan using t1_b_a_idx on t1  (cost=0.57..60.10 rows=7
width=0) (actual time=80.460..115.590 rows=7 loops=1)
 Index Cond: ((b = 33) AND (a = ANY
('{1,9,17,26,35,41,50}'::integer[])))
 Heap Fetches: 7
 Buffers: shared hit=24 read=11
 Total runtime: 116.480 ms


There is a difference in read operations and moreover in cost estimation.
23 - 11 = 12 excess read operations. If they are all random reads they may
take ~100ms on typical home/workstation SATA hard drive. That's the
difference between
timings I showed above.
Yes, I understand that Pg doesn't know (while planning the query) how many
pages will be hitted in shared buffers.
But I can't get why there is the same buffers count (35) in both plans...
And I can't get why I have different cost estimations...


I grant the theory that the repeated index probes in t1_b_a_idx should be
 more localized than those in t1_a_b_idx, but PG's optimizer doesn't


Yes, I see t1_a_b_idx and t1_b_a_idx have 3 levels in btree. For t1_a_b_idx
Pg have to read 1 (header) + 1 (root) + 1 (common level 1 node) + 7 * 2 =
17 pages in it
and for t1_b_a_idx 1 + 1 + 3 = 5 pages ('cause all 7 pairs of (a, b) are
located in one btree leaf node). 17 - 5 = 12 - this is the same difference
as we can see in
'explain (analyze, buffers)'.




 attempt to estimate such effects, and this example isn't doing much to
 convince me that it'd be worth the trouble.


In a real life situation I have two kinds of queries for that table:
* select ... from t1 where a in (...) and b = ?
* select ... from t1 where a = ? and b in (...)

I select fields from t1 that are not in indexes thus there is no 'Index
Only Scan', more random reads and performance impact of choosing t1_a_b_idx
in both queries is a bit lesser.

And I got the answer (PG's optimizer doesn't attempt to estimate such
effects) for my situation.
Thanks a lot.