Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-24 Thread Decibel!

On Apr 23, 2008, at 12:31 AM, Greg Smith wrote:
Jim threw out that you can just look at the page hit percentages  
instead. That's not completely true.  If you've had some nasty  
query blow out your buffer cache, or if the server has been up a  
looong time and the total stas don't really reflect recent reality,  
what's in the buffer cache and what the stats say have been  
historical cached can diverge.



Yeah, I kinda hand-waved over that. Obviously we'd want to look at  
recent stats, not beginning of time.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-23 Thread PFC



Example : let's imagine a cache priority setting.


Which we can presume the DBA will set incorrectly because the tools  
needed to set that right aren't easy to use.


LOL, yes.

Jim threw out that you can just look at the page hit percentages  
instead. That's not completely true.  If you've had some nasty query  
blow out your buffer cache, or if the server has been up a looong time  
and the total stas don't really reflect recent reality, what's in the  
buffer cache and what the stats say have been historical cached can  
diverge.


Yes :
- perform huge query on table A
- table A is now in cache
- perform huge query on table B
- table B is now in cache, A isn't
- perform huge query on table A again
- postgres still thinks table A is cached and chooses a bad plan


This would not examine whatever is in the OS' cache, though.


	Yeah, but now that shared_buffers can be set to a large part of physical  
RAM, does it still matters ?
	Point is, postgres knows what is in the shared_buffers, so it can make a  
good decision. Postgres doesn't know what the OS has in cache, so it could  
only make a wild guess. I would rather err on the side of safety...




I don't know that it's too unrealistic to model the OS as just being an  
extrapolated bigger version of the buffer cache.  I can think of a  
couple of ways those can diverge:


1) Popular pages that get high usage counts can end up with a higher  
representation in shared_buffers than the OS


2) If you've being doing something like a bulk update, you can have lots  
of pages that have been written recently in the OS cache that aren't  
really accounted for fully in shared_buffers, because they never get a  
high enough usage count to stay there (only used once) but can fill the  
OS cache as they're spooled up to write.


Especially on CHECKPOINT

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-23 Thread Zeugswetter Andreas OSB SD

  The optimizer could then use a different (much lower) value of  
  random_page_cost for tables for which cache priority is set  
  highest since it would know.
 
 cache priority to me sounds like we're trying to influence caching  
 behavior, which isn't what's happening. I do agree that we need a  
 better way to tell the planner what tables are in memory.

I think overruling the cache manager to more aggressively cache certain
objects is a bad idea in general.
e.g. the above telling the planner can easily produce self fulfilling
prophecies. Instead, if we find situations where the cache is not
optimally used we should try to improve the cache algorithm.

A per tablespace random_page_cost might make more sense, as Tom already
said.

e.g. Informix had a command to lock a table into memory, but apparently
it was so often misused, that the feature has been removed again, and
replaced by a better caching algorithm.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-23 Thread Ron Mayer

Decibel! wrote:
 we can just look at 
the hit rate for the object. But we'd also need stats for how often we 
find pages for a relation in the OS cache, which no one has come up with 
a good method for.


Makes me wonder if we could (optionally, I guess, since timing
stuff is apparently slow on some systems) also keep save the
average time it took for a block to get ready in pg_statio_all_tables.
Or, (if possible), save the averages for random and sequential pages
separately.

Then rather than using guessed values in the config files it seems
the plans could use the actual averages per table.

That would address both poor guesses on random_page_cost,
effective_cache_size, etc - as well as get things right
on systems where some tablespaces are fast and some are slow.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread PFC


It started with this query :

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON  
(n.id=r.child_id) WHERE r.parent_id=16330;

QUERY PLAN
---
 Hash Join  (cost=370.96..496.29 rows=543 width=273) (actual  
time=18.887..21.164 rows=543 loops=1)

   Hash Cond: (r.child_id = n.id)
   -  Index Scan using relations_unique on relations r   
(cost=0.00..111.75 rows=543 width=58) (actual time=0.022..0.519 rows=543  
loops=1)

 Index Cond: (parent_id = 16330)
   -  Hash  (cost=243.76..243.76 rows=10176 width=215) (actual  
time=18.830..18.830 rows=10176 loops=1)
 -  Seq Scan on nodes n  (cost=0.00..243.76 rows=10176 width=215)  
(actual time=0.006..5.135 rows=10176 loops=1)

 Total runtime: 21.453 ms

SET enable_hashjoin TO 0;

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON  
(n.id=r.child_id) WHERE r.parent_id=16330;

QUERY PLAN
---
 Nested Loop  (cost=0.00..514.50 rows=543 width=273) (actual  
time=0.037..4.412 rows=543 loops=1)
   -  Index Scan using relations_unique on relations r   
(cost=0.00..111.75 rows=543 width=58) (actual time=0.023..0.476 rows=543  
loops=1)

 Index Cond: (parent_id = 16330)
   -  Index Scan using nodes_pkey on nodes n  (cost=0.00..0.73 rows=1  
width=215) (actual time=0.004..0.005 rows=1 loops=543)

 Index Cond: (n.id = r.child_id)
 Total runtime: 4.638 ms

In order to have it use the fast plan I must set random_page_cost to 1  
which I absolutely don't want to do.

Setting effective_cache_size to huge values has no effect.
If I select a value of parent_id that has much less children, the index  
will be used, but in this case I think the threshold is misplaced, it  
should be slightly higher. Here we have about 5% of values selected. Hash  
join becomes better at about 15% because the table is cached.

This is 8.3.

Perhaps there would be a need for a per-object setting  
(object=table,index,partition) to alter the aggressiveness/lazyness of the  
page flushing and how long the pages for this object are kept in  
shared_buffers... this would be used to modify random_page_cost on a  
per-table/index/partition basis.


Example : let's imagine a cache priority setting.

- cache priority set to the minimum means this table is mostly write-only
- cache priority set to default would give current behaviour (which is  
correct in most cases)
- cache priority set to a high value would tell Postgres I know this  
table/index/partition is small and often accessed rather randomly, so I  
want you to keep it in shared_buffers, purge it if you must but otherwise  
keep it in memory, flush something else instead which has lower  
cache_priority.


The optimizer could then use a different (much lower) value of  
random_page_cost for tables for which cache priority is set highest  
since it would know.


An alternative would be for the background writer to keep some stats and  
do the thing for us :


- begin bgwriter scan
- setup hashtable of [relid = page count]
- at each page that is scanned, increment page count for this relation  
(uses very little CPU)

- end bgwriter stats
- for each relation, compare the number of pages we found in  
shared_buffers with the number of pages in the relation and draw  
conclusions about how well cached the relation is

- update random_page_cost accordingly for this relation

This would not examine whatever is in the OS' cache, though.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread Decibel!

On Apr 22, 2008, at 5:13 AM, PFC wrote:
In order to have it use the fast plan I must set random_page_cost  
to 1 which I absolutely don't want to do.

Setting effective_cache_size to huge values has no effect.
If I select a value of parent_id that has much less children, the  
index will be used, but in this case I think the threshold is  
misplaced, it should be slightly higher. Here we have about 5% of  
values selected. Hash join becomes better at about 15% because the  
table is cached.

This is 8.3.

Perhaps there would be a need for a per-object setting  
(object=table,index,partition) to alter the aggressiveness/lazyness  
of the page flushing and how long the pages for this object are  
kept in shared_buffers... this would be used to modify  
random_page_cost on a per-table/index/partition basis.


Example : let's imagine a cache priority setting.

- cache priority set to the minimum means this table is mostly  
write-only
- cache priority set to default would give current behaviour  
(which is correct in most cases)
- cache priority set to a high value would tell Postgres I know  
this table/index/partition is small and often accessed rather  
randomly, so I want you to keep it in shared_buffers, purge it if  
you must but otherwise keep it in memory, flush something else  
instead which has lower cache_priority.


The optimizer could then use a different (much lower) value of  
random_page_cost for tables for which cache priority is set  
highest since it would know.


cache priority to me sounds like we're trying to influence caching  
behavior, which isn't what's happening. I do agree that we need a  
better way to tell the planner what tables are in memory.


An alternative would be for the background writer to keep some  
stats and do the thing for us :


- begin bgwriter scan
- setup hashtable of [relid = page count]
- at each page that is scanned, increment page count for this  
relation (uses very little CPU)

- end bgwriter stats
- for each relation, compare the number of pages we found in  
shared_buffers with the number of pages in the relation and draw  
conclusions about how well cached the relation is

- update random_page_cost accordingly for this relation

This would not examine whatever is in the OS' cache, though.


Actually, there's no need for bgwriter to do that; we can just look  
at the hit rate for the object. But we'd also need stats for how  
often we find pages for a relation in the OS cache, which no one has  
come up with a good method for.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread Gurjeet Singh
On Wed, Apr 23, 2008 at 12:11 AM, Decibel! [EMAIL PROTECTED] wrote:

 On Apr 22, 2008, at 5:13 AM, PFC wrote:

  In order to have it use the fast plan I must set random_page_cost to 1
  which I absolutely don't want to do.
  Setting effective_cache_size to huge values has no effect.
  If I select a value of parent_id that has much less children, the index
  will be used, but in this case I think the threshold is misplaced, it should
  be slightly higher. Here we have about 5% of values selected. Hash join
  becomes better at about 15% because the table is cached.
  This is 8.3.
 
  Perhaps there would be a need for a per-object setting
  (object=table,index,partition) to alter the aggressiveness/lazyness of the
  page flushing and how long the pages for this object are kept in
  shared_buffers... this would be used to modify random_page_cost on a
  per-table/index/partition basis.
 
  Example : let's imagine a cache priority setting.
 
  - cache priority set to the minimum means this table is mostly
  write-only
  - cache priority set to default would give current behaviour (which is
  correct in most cases)
  - cache priority set to a high value would tell Postgres I know this
  table/index/partition is small and often accessed rather randomly, so I want
  you to keep it in shared_buffers, purge it if you must but otherwise keep it
  in memory, flush something else instead which has lower cache_priority.
 
  The optimizer could then use a different (much lower) value of
  random_page_cost for tables for which cache priority is set highest since
  it would know.
 

 cache priority to me sounds like we're trying to influence caching
 behavior, which isn't what's happening. I do agree that we need a better way
 to tell the planner what tables are in memory.

  An alternative would be for the background writer to keep some stats and
  do the thing for us :
 
  - begin bgwriter scan
  - setup hashtable of [relid = page count]
  - at each page that is scanned, increment page count for this relation
  (uses very little CPU)
  - end bgwriter stats
  - for each relation, compare the number of pages we found in
  shared_buffers with the number of pages in the relation and draw conclusions
  about how well cached the relation is
  - update random_page_cost accordingly for this relation
 
  This would not examine whatever is in the OS' cache, though.
 

 Actually, there's no need for bgwriter to do that; we can just look at the
 hit rate for the object. But we'd also need stats for how often we find
 pages for a relation in the OS cache, which no one has come up with a good
 method for.


Something related... a per-relation cost setting would also allow users to
tune based on the kind of storage those objects are stored. Using
tablespaces, users can choose to place some objects on really expansive/
really fast storage, and other (not so hot) objects on a slower/cheaper
storage.

-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes:
 cache priority to me sounds like we're trying to influence caching  
 behavior, which isn't what's happening. I do agree that we need a  
 better way to tell the planner what tables are in memory.

What's been discussed in the past is per-tablespace settings for
random_page_cost and friends.  That was meant to cover actual disk
hardware differences, but could be (ab)used to handle the case of
heavily and not so heavily used tables.

Per-table sounds kinda bogus to me; such settings would probably reflect
wishful thinking on the part of the DBA more than reality.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-table random_page_cost for tables that we know are always cached

2008-04-22 Thread Greg Smith

On Tue, 22 Apr 2008, PFC wrote:


Example : let's imagine a cache priority setting.


Which we can presume the DBA will set incorrectly because the tools needed 
to set that right aren't easy to use.


An alternative would be for the background writer to keep some stats and do 
the thing for us :

- begin bgwriter scan
- setup hashtable of [relid = page count]
- at each page that is scanned, increment page count for this 
relation...


I've already got a plan sketched out that does this I didn't manage to get 
finished in time for 8.3.  What I wanted it for was not for this purpose, 
but for instrumentation of what's in the cache that admins can look at. 
Right now you can get that out pg_buffercache, but that's kind of 
intrusive because of the locks it takes.  In many cases I'd be perfectly 
happy with an approximation of what's inside the buffer cache, accumulated 
while the page header is being locked anyway as the BGW passed over it. 
And as you note having this data available can be handy for internal 
self-tuning as well once it's out there.


Jim threw out that you can just look at the page hit percentages instead. 
That's not completely true.  If you've had some nasty query blow out your 
buffer cache, or if the server has been up a looong time and the total 
stas don't really reflect recent reality, what's in the buffer cache and 
what the stats say have been historical cached can diverge.



This would not examine whatever is in the OS' cache, though.


I don't know that it's too unrealistic to model the OS as just being an 
extrapolated bigger version of the buffer cache.  I can think of a couple 
of ways those can diverge:


1) Popular pages that get high usage counts can end up with a higher 
representation in shared_buffers than the OS


2) If you've being doing something like a bulk update, you can have lots 
of pages that have been written recently in the OS cache that aren't 
really accounted for fully in shared_buffers, because they never get a 
high enough usage count to stay there (only used once) but can fill the OS 
cache as they're spooled up to write.


I'm not sure that either of these cases are so strong they invalidate your 
basic idea though.  There's a pending 8.4 TODO to investigate whether 
increasing the maximum usage count a buffer can get would be an 
improvement.  If that number got bumped up I could see (2) become more of 
a problem.


I'd be a somewhat concerned about turning this mechanism on by default 
though, at least at first.  A hybrid approach that gives the DBA some 
control might work well.  Maybe have an adjust estimates for cache 
contents knob that you can toggle on a per-session or per-table basis?


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers