Re: [HACKERS] index-only scans

Anssi Kääriäinen Tue, 16 Aug 2011 03:05:16 -0700

On 08/14/2011 12:31 AM, Heikki Linnakangas wrote:

The same idea could of course be used to calculate the effective cache hit 
ratio for each table. Cache hit ratio would have the problem of feedback loops, 
though.

Yeah, I'm not excited about making the planner and statistics more
dynamic. Feedback loops and plan instability are not fun.

I might be a little out of my league here... But I was thinking aboutthe cache hit ratio and feedback loops. I understand automatic tuningwould be hard. But making automatic tuning easier (by using pg_tune forexample) would be a big plus for most use cases.

To make it easier to tune the page read costs automatically, it would benice if there would be four variables instead of the current two:- random_page_cost is the cost of reading a random page from storage.Currently it is not, it is the cost of accessing a random page, takingin account it might be in memory.

  - seq_page_cost is the cost of reading pages sequentially from storage
  - memory_page_cost is the cost of reading a page in memory
  - cache_hit_ratio is the expected cache hit ratio

memory_page_cost would be server global, random and seq page coststablespace specific, and cache_hit_ratio relation specific. You wouldget the current behavior by tuning *_page_costs realistically, andsetting cache_hit_ratio globally so that the expected random_page_cost /seq_page_cost stays the same as now.

The biggest advantage of this would be that the correct values are mucheasier to detect automatically compared to current situation. This canbe done using pg_statio_* views and IO speed testing. They should not betuned automatically by PostgreSQL, at least not the cache_hit_ratio, asthat leads to the possibility of feedback loops and plan instability.The variables would also be much easier to understand.

There is the question if one should be allowed to tune the *_page_costsat all. If I am not missing something, it is possible to detect thecorrect values programmatically and they do not change if you do notchange the hardware. Cache hit ratio is the real reason why they arecurrently so important for tuning.

An example why the current random_page_cost and seq_page_cost tuning isnot adequate is that you can only set random_page_cost per tablespace.That makes perfect sense if random_page_cost would be the cost ofaccessing a page in storage. But it is not, it is a combination of thatand caching effects, so that it actually varies per relation (and overtime). How do you set it correctly for a query where one relation isfully cached and another one not?

Another problem is that if you use random_page_cost == seq_page_cost,you are effectively saying that everything is in cache. But ifeverything is in cache, the cost of page access relative to cpu_*_costsis way off. The more random_page_cost and seq_page_cost are different,the more they mean the storage access costs. When they are the same,they mean the memory page cost. There can be an order of magnitude indifference of a storage page cost and a memory page cost. So it is hardto tune the cpu_*_costs realistically for cases where sometimes data isin cache and sometimes not.

Ok, enough hand waving for one post :) Sorry if this all is obvious /discussed before. My googling didn't turn out anything directly related,although these have some similarity:- Per-table random_page_cost for tables that we know are always cached[http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php]

 - Script to compute random page cost
[http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php]
-  The science of optimization in practical terms?

[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php],getting really interesting starting from here:

[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]

 - Anssi


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans

Reply via email to