Re: [GENERAL] Convincing the query planner to play nice

2013-08-21 Thread Tim Kane

Thanks Jeff. These queries in particular relate to a set of data that is
rebuilt on a periodic basis. For all intents and purposes, the data is
newly populated and unlikely to reside in cache - hence the need to
perform my tests under similar conditions.

It's probably better than I adjust the random_page_cost for that
particular session, and leave things be otherwise.


Cheers.



On 13/08/2013 17:27, Jeff Janes jeff.ja...@gmail.com wrote:

On Sat, Aug 10, 2013 at 10:32 AM, Timothy Kane tim.k...@gmail.com wrote:

 Hi all,

 I seem to be having some grief with the 9.1.9 query planner favouring an
 index scan + merge join, over a sequential scan + hash join.
 Logically I would have considered the index+merge to be faster, as
suggested
 by the explain output - but in practice, it is in fact slower by orders
of
 magnitude.

 In my timings below, I've tried to reduce the impact of any OS or
 shared_buffer level caching (restarting postgres, and flushing OS cache
 between queries-).


Are you sure that that is the right thing to do?  It seems unlikely
that your production server is constantly executing your query from a
cold start.  Why test it that way?


 I've provided my settings as shown:


 =# show seq_page_cost;
  seq_page_cost
 ---
  1
 (1 row)

 Time: 0.355 ms
 =# show random_page_cost;
  random_page_cost
 --
  2.2
 (1 row)

Given that you are testing your query from a cold start (and assuming
against odds that that is the correct thing to do), 2.2 is probably a
factor of 20 too small for this setting.

Cheers,

Jeff




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-13 Thread Jeff Janes
On Sat, Aug 10, 2013 at 10:32 AM, Timothy Kane tim.k...@gmail.com wrote:

 Hi all,

 I seem to be having some grief with the 9.1.9 query planner favouring an
 index scan + merge join, over a sequential scan + hash join.
 Logically I would have considered the index+merge to be faster, as suggested
 by the explain output - but in practice, it is in fact slower by orders of
 magnitude.

 In my timings below, I've tried to reduce the impact of any OS or
 shared_buffer level caching (restarting postgres, and flushing OS cache
 between queries-).


Are you sure that that is the right thing to do?  It seems unlikely
that your production server is constantly executing your query from a
cold start.  Why test it that way?


 I've provided my settings as shown:


 =# show seq_page_cost;
  seq_page_cost
 ---
  1
 (1 row)

 Time: 0.355 ms
 =# show random_page_cost;
  random_page_cost
 --
  2.2
 (1 row)

Given that you are testing your query from a cold start (and assuming
against odds that that is the correct thing to do), 2.2 is probably a
factor of 20 too small for this setting.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-13 Thread Jeff Janes
On Sat, Aug 10, 2013 at 5:24 PM, Tim Kane tim.k...@gmail.com wrote:



 Again, just thinking out loud here..   In a scenario where specific
 clustering isn't an option...
 I wonder if the query planner should consider the physical
 distribution/ordering of values on disk, and use that as a factor when
 applying the random_page_cost in the QEP's?

It does do that, based on the correlation column in pg_stats.
However, because your original random_page_cost is already very close
to seq_page_cost, this adjustment doesn't have a huge effect in your
case.  I don't know how much of an effect it would have even then,
because of the range overlap issue that Tom mentions.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Timothy Kane

Hi all,

I seem to be having some grief with the 9.1.9 query planner favouring an index 
scan + merge join, over a sequential scan + hash join.
Logically I would have considered the index+merge to be faster, as suggested by 
the explain output - but in practice, it is in fact slower by orders of 
magnitude.

In my timings below, I've tried to reduce the impact of any OS or shared_buffer 
level caching (restarting postgres, and flushing OS cache between queries-).

I've provided my settings as shown:


=# show seq_page_cost;
 seq_page_cost 
---
 1
(1 row)

Time: 0.355 ms
=# show random_page_cost;
 random_page_cost 
--
 2.2
(1 row)

Time: 0.084 ms
=# show cpu_tuple_cost;
 cpu_tuple_cost 

 0.01
(1 row)

Time: 0.077 ms
=# show cpu_index_tuple_cost;
 cpu_index_tuple_cost 
--
 0.005
(1 row)

Time: 0.065 ms
=# show cpu_operator_cost;
 cpu_operator_cost 
---
 0.0025
(1 row)

Time: 0.064 ms
=# show effective_cache_size;
 effective_cache_size 
--
 12GB
(1 row)



-- QEP's for 9.1.9
=# explain (analyse,buffers) select * from archive.users inner join 
live.addresses using (address_id);

   QUERY 
PLAN   

 Merge Join  (cost=18.79..159615.38 rows=1887786 width=131) (actual 
time=0.023..602386.955 rows=1862872 loops=1)
   Merge Cond: (addresses.address_id = users.address_id)
   Buffers: shared hit=1655113 read=382486
   -  Index Scan using addresses_pkey on addresses  (cost=0.00..52609.75 
rows=1872220 width=22) (actual time=0.008..1440.294 rows=1872220 loops=1)
 Buffers: shared hit=473352 read=18328
   -  Index Scan using address_id_users on users  (cost=0.00..3075311.08 
rows=73741592 width=117) (actual time=0.005..598455.258 rows=1862873 loops=1)
 Buffers: shared hit=1181761 read=364158
 Total runtime: 602548.352 ms
(8 rows)

Time: 603090.399 ms



=# set enable_indexscan=off;
SET
Time: 0.219 ms
=# explain (analyse,buffers) select * from archive.users inner join 
live.addresses using (address_id);
QUERY PLAN  
  
--
 Hash Join  (cost=55340.95..2707034.64 rows=1887786 width=131) (actual 
time=1279.659..36886.595 rows=1862872 loops=1)
   Hash Cond: (users.address_id = addresses.address_id)
   Buffers: shared hit=6 read=1079019
   -  Seq Scan on users  (cost=0.00..1803222.92 rows=73741592 width=117) 
(actual time=5.082..26430.189 rows=73741544 loops=1)
 Buffers: shared hit=2 read=1065805
   -  Hash  (cost=31938.20..31938.20 rows=1872220 width=22) (actual 
time=1273.432..1273.432 rows=1872220 loops=1)
 Buckets: 262144  Batches: 1  Memory Usage: 112381kB
 Buffers: shared hit=2 read=13214
 -  Seq Scan on addresses  (cost=0.00..31938.20 rows=1872220 width=22) 
(actual time=7.190..553.516 rows=1872220 loops=1)
   Buffers: shared hit=2 read=13214
 Total runtime: 37014.912 ms
(11 rows)

Time: 37518.029 ms





The only way I can artificially convince the planner to choose the sequential 
scan method is to increase cpu_index_tuple_cost from 0.005 to 1.4
This suggests something is really really wrong with the statistics on this 
table, as that shouldn't be necessary.



Interestingly, on another instance of this same database running on postgres 
8.3.8, the query planner correctly chooses the sequential scan method - having 
more sane cost estimates for the index scan method.

-- QEP\s for 8.3.8
=# explain select * from archive.users inner join live.addresses using 
(address_id);
   QUERY PLAN   

 Hash Join  (cost=55340.95..2783655.68 rows=1949180 width=133)
   Hash Cond: (users.address_id = addresses.address_id)
   -  Seq Scan on users  (cost=0.00..1879254.32 rows=73739432 width=119)
   -  Hash  (cost=31938.20..31938.20 rows=1872220 width=22)
 -  Seq Scan on addresses  (cost=0.00..31938.20 rows=1872220 width=22)
(5 rows)

=# set enable_seqscan=off;
SET
=# explain select * from archive.users inner join live.addresses using 
(address_id);
 QUERY PLAN 
 
-
 Merge Join  (cost=6.98..3496768.28 rows=1949180 width=133)
   Merge Cond: (addresses.address_id = users.address_id)
   -  Index Scan using addresses_pkey on 

Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tom Lane
Timothy Kane tim.k...@gmail.com writes:
 I seem to be having some grief with the 9.1.9 query planner favouring an 
 index scan + merge join, over a sequential scan + hash join.

I believe the reason it's preferring the merge join plan is that it thinks
the executor will be able to terminate the merge join early as a
consequence of the range of join keys in addresses being only a fraction
of the range of join keys in users.  Notice that the total estimated
cost for the merge join is just a fraction of the full estimated cost of
the indexscan on users; the only way that's possible is if the indexscan
on users doesn't have to run through all of the table.  Probably, the
range of join keys is wider than the planner thinks and so the merge join
can't terminate early.  The fix therefore is to crank the stats target for
addresses up high enough that you get a reasonable value in pg_statistic
for the largest address_id value (look at the last histogram entry).

 Interestingly, on another instance of this same database running on postgres 
 8.3.8, the query planner correctly chooses the sequential scan method - 
 having more sane cost estimates for the index scan method.

I think the 8.3 planner didn't take this effect into account.  Or maybe it
did, but by chance the upper histogram value is closer to reality on the
older database.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Okay, so I've played with this a bit more. I think I have it cracked.

I had to increase random_page_cost and significantly reduce 
effective_cache_size in order for the planner to prefer a sequential scan. (It 
turns out this is what the 8.3.8 instance was doing all along, so it's not 
anything specific to 9.1.9).

Assuming that effective_cache_size has no bearing on postgres behaviour outside 
of query planning, then I guess that's ok. It must be simply that the index 
based method causes a lot of random order reads of the relation.

A better way however, seems to be clustering the table based on the address_id 
index.  This seems to have done the job nicely, allowing the pages to be read 
in the order they're recorded on disk.  In fact, it performs incredibly well 
now. Who knew! :)


=# explain (analyse,buffers) select * from archive.users inner join 
live.addresses using (address_id);
  QUERY 
PLAN  
--
 Merge Join  (cost=756.82..151865.53 rows=1982043 width=131) (actual 
time=6.417..3851.314 rows=1862872 loops=1)
   Merge Cond: (addresses.address_id = users.address_id)
   Buffers: shared hit=10 read=65799
   -  Index Scan using addresses_pkey on addresses  (cost=0.00..52602.26 
rows=1872220 width=22) (actual time=0.011..638.291 rows=1872220 loops=1)
 Buffers: shared hit=6 read=18328
   -  Index Scan using address_id_users on users  (cost=0.00..2630954.66 
rows=74700184 width=117) (actual time=6.391..1657.213 rows=1862873 loops=1)
 Buffers: shared hit=4 read=47471
 Total runtime: 3954.146 ms
(8 rows)



Again, just thinking out loud here..   In a scenario where specific clustering 
isn't an option...
I wonder if the query planner should consider the physical 
distribution/ordering of values on disk, and use that as a factor when applying 
the random_page_cost in the QEP's?

I'm sure I've missed something fundamental here, happy to be corrected :)

Cheers,

Tim 







On 10 Aug 2013, at 18:32, Timothy Kane tim.k...@gmail.com wrote:

 
 Hi all,
 
 I seem to be having some grief with the 9.1.9 query planner favouring an 
 index scan + merge join, over a sequential scan + hash join.
 Logically I would have considered the index+merge to be faster, as suggested 
 by the explain output - but in practice, it is in fact slower by orders of 
 magnitude.
 
 In my timings below, I've tried to reduce the impact of any OS or 
 shared_buffer level caching (restarting postgres, and flushing OS cache 
 between queries-).
 
 I've provided my settings as shown:
 
 
 =# show seq_page_cost;
  seq_page_cost 
 ---
  1
 (1 row)
 
 Time: 0.355 ms
 =# show random_page_cost;
  random_page_cost 
 --
  2.2
 (1 row)
 
 Time: 0.084 ms
 =# show cpu_tuple_cost;
  cpu_tuple_cost 
 
  0.01
 (1 row)
 
 Time: 0.077 ms
 =# show cpu_index_tuple_cost;
  cpu_index_tuple_cost 
 --
  0.005
 (1 row)
 
 Time: 0.065 ms
 =# show cpu_operator_cost;
  cpu_operator_cost 
 ---
  0.0025
 (1 row)
 
 Time: 0.064 ms
 =# show effective_cache_size;
  effective_cache_size 
 --
  12GB
 (1 row)
 
 
 
 -- QEP's for 9.1.9
 =# explain (analyse,buffers) select * from archive.users inner join 
 live.addresses using (address_id);
 
QUERY 
 PLAN   
 
  Merge Join  (cost=18.79..159615.38 rows=1887786 width=131) (actual 
 time=0.023..602386.955 rows=1862872 loops=1)
Merge Cond: (addresses.address_id = users.address_id)
Buffers: shared hit=1655113 read=382486
-  Index Scan using addresses_pkey on addresses  (cost=0.00..52609.75 
 rows=1872220 width=22) (actual time=0.008..1440.294 rows=1872220 loops=1)
  Buffers: shared hit=473352 read=18328
-  Index Scan using address_id_users on users  (cost=0.00..3075311.08 
 rows=73741592 width=117) (actual time=0.005..598455.258 rows=1862873 loops=1)
  Buffers: shared hit=1181761 read=364158
  Total runtime: 602548.352 ms
 (8 rows)
 
 Time: 603090.399 ms
 
 
 
 =# set enable_indexscan=off;
 SET
 Time: 0.219 ms
 =# explain (analyse,buffers) select * from archive.users inner join 
 live.addresses using (address_id);
 QUERY PLAN
 
 --
  Hash Join  (cost=55340.95..2707034.64 

Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Ahh, thanks Tom.
I hadn't seen your email before I posted my own followup.

I guess the clustering approach managed to work around the need to mess with 
the statistics target.  I did previously increase the target to 1000 (from 100) 
for that field and had no impact, but this is an aspect of tuning I'm not so 
familiar with - I didn't consider pushing it all the way to 11.


On 11 Aug 2013, at 00:28, Tom Lane t...@sss.pgh.pa.us wrote:

 Timothy Kane tim.k...@gmail.com writes:
 I seem to be having some grief with the 9.1.9 query planner favouring an 
 index scan + merge join, over a sequential scan + hash join.
 
 I believe the reason it's preferring the merge join plan is that it thinks
 the executor will be able to terminate the merge join early as a
 consequence of the range of join keys in addresses being only a fraction
 of the range of join keys in users.  Notice that the total estimated
 cost for the merge join is just a fraction of the full estimated cost of
 the indexscan on users; the only way that's possible is if the indexscan
 on users doesn't have to run through all of the table.  Probably, the
 range of join keys is wider than the planner thinks and so the merge join
 can't terminate early.  The fix therefore is to crank the stats target for
 addresses up high enough that you get a reasonable value in pg_statistic
 for the largest address_id value (look at the last histogram entry).
 
 Interestingly, on another instance of this same database running on postgres 
 8.3.8, the query planner correctly chooses the sequential scan method - 
 having more sane cost estimates for the index scan method.
 
 I think the 8.3 planner didn't take this effect into account.  Or maybe it
 did, but by chance the upper histogram value is closer to reality on the
 older database.
 
   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tom Lane
Tim Kane tim.k...@gmail.com writes:
 I guess the clustering approach managed to work around the need to mess with 
 the statistics target.  I did previously increase the target to 1000 (from 
 100) for that field and had no impact, but this is an aspect of tuning I'm 
 not so familiar with - I didn't consider pushing it all the way to 11.

Yeah, I had actually started to write an email recommending that you dial
down effective_cache_size and increase random_page_cost, before I noticed
the discrepancy in the merge join cost and realized what was really going
on.

The question now is why you had those settings like that before, and
whether changing them back in the direction of the defaults might not be
pessimizing the behavior for other queries.  If you have a lot of RAM and
mostly-cached queries, the previous settings didn't sound unreasonable.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Convincing the query planner to play nice

2013-08-10 Thread Tim Kane

Yep, the effective_cache_size was specifically because we have lots of RAM to 
play with, and want to ensure we are caching wherever possible.
The reduced random_page_cost was primarily to allow for the fact we're using 
relatively fast disk (indexes are SSD, table data on SAS drives), though I 
didn't fully appreciate how the combination of these settings can influence the 
preference towards a sequential vs index scan.

I think i'll stop tweaking for now, and see how it performs in the next few 
days.  I feel like I have a much better handle on how the planner is pulling 
everything together. Cheers.

Tim


On 11 Aug 2013, at 01:38, Tom Lane t...@sss.pgh.pa.us wrote:

 Tim Kane tim.k...@gmail.com writes:
 I guess the clustering approach managed to work around the need to mess with 
 the statistics target.  I did previously increase the target to 1000 (from 
 100) for that field and had no impact, but this is an aspect of tuning I'm 
 not so familiar with - I didn't consider pushing it all the way to 11.
 
 Yeah, I had actually started to write an email recommending that you dial
 down effective_cache_size and increase random_page_cost, before I noticed
 the discrepancy in the merge join cost and realized what was really going
 on.
 
 The question now is why you had those settings like that before, and
 whether changing them back in the direction of the defaults might not be
 pessimizing the behavior for other queries.  If you have a lot of RAM and
 mostly-cached queries, the previous settings didn't sound unreasonable.
 
   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general