Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-24 Thread Vitalii Tymchyshyn

Hello.

As of me, all this hot thing really looks like uncertain and dynamic 
enough.
Two things that I could directly use right now (and they are needed in 
pair) are:
1)Per-table/index/database bufferpools (split shared buffer into parts, 
allow to specify which index/table/database goes where)

2)Per-table/index cost settings

If I had this, I could allocate specific bufferpools for tables/indexes 
that MUST be hot in memory and set low costs for this specific tables.
P.S. Third thing, great to have to companion this two is Load on 
startup flag to automatically populate bufferpools with fast sequential 
read, but this can be easily emulated with a statement.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Josh Berkus

 Well, all of that stuff sounds impractically expensive to me... but I
 just work here.

I'll point out that the simple version, which just checks for hot tables
and indexes, would improve estimates greatly and be a LOT less
complicated than these proposals.  Certainly having some form of
block-based or range-based stats would be better, but it also sounds
hard enough to maybe never get done.

Having user-accessible hot stats would also be useful to DBAs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-23 Thread Robert Haas
On Mon, May 23, 2011 at 3:08 PM, Josh Berkus j...@agliodbs.com wrote:

 Well, all of that stuff sounds impractically expensive to me... but I
 just work here.

 I'll point out that the simple version, which just checks for hot tables
 and indexes, would improve estimates greatly and be a LOT less
 complicated than these proposals.

I realize I'm sounding like a broken record here, but as far as I can
tell there is absolutely zero evidence that that would be better.  I'm
sure you're in good company thinking so, but the list of things that
could skew (or should I say, screw) the estimates is long and painful;
and if those estimates are wrong, you'll end up with something that is
both worse and less predictable than the status quo.  First, I haven't
seen a shred of hard evidence that the contents of the buffer cache or
OS cache are stable enough to be relied upon, and we've repeatedly
discussed workloads where that might not be true.  Has anyone done a
systematic study of this on a variety real production systems?  If so,
the results haven't been posted here, at least not that I can recall.
Second, even if we were willing to accept that we could obtain
relatively stable and accurate measurements of this data, who is to
say that basing plans on it would actually result in an improvement in
plan quality?  That may seem obvious, but I don't think it is.  The
proposed method is a bit like trying to determine the altitude of a
hot air balloon by throwing the ballast over the side and timing how
long it takes to hit the ground.  Executing plans that are based on
the contents of the cache will change the contents of the cache, which
will in turn change the plans.  The idea that we can know, without any
experimentation, how that's going to shake out, seems to me to be an
exercise in unjustified optimism of the first order.

Sorry to sound grumpy and pessimistic, but I really think we're
letting our enthusiasm get way, way ahead of the evidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas
On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Jim Nasby wrote:
 I think the challenge there would be how to define the scope of the
 hot-spot. Is it the last X pages? Last X serial values? Something like
 correlation?

 Hmm... it would be interesting if we had average relation access times for
 each stats bucket on a per-column basis; that would give the planner a
 better idea of how much IO overhead there would be for a given WHERE clause

 You've already given one reasonable first answer to your question here.  If
 you defined a usage counter for each histogram bucket, and incremented that
 each time something from it was touched, that could lead to a very rough way
 to determine access distribution.  Compute a ratio of the counts in those
 buckets, then have an estimate of the total cached percentage; multiplying
 the two will give you an idea how much of that specific bucket might be in
 memory.  It's not perfect, and you need to incorporate some sort of aging
 method to it (probably weighted average based), but the basic idea could
 work.

Maybe I'm missing something here, but it seems like that would be
nightmarishly slow.  Every time you read a tuple, you'd have to look
at every column of the tuple and determine which histogram bucket it
was in (or, presumably, which MCV it is, since those aren't included
in working out the histogram buckets).  That seems like it would slow
down a sequential scan by at least 10x.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Jim Nasby
On May 19, 2011, at 9:53 AM, Robert Haas wrote:
 On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Jim Nasby wrote:
 I think the challenge there would be how to define the scope of the
 hot-spot. Is it the last X pages? Last X serial values? Something like
 correlation?
 
 Hmm... it would be interesting if we had average relation access times for
 each stats bucket on a per-column basis; that would give the planner a
 better idea of how much IO overhead there would be for a given WHERE clause
 
 You've already given one reasonable first answer to your question here.  If
 you defined a usage counter for each histogram bucket, and incremented that
 each time something from it was touched, that could lead to a very rough way
 to determine access distribution.  Compute a ratio of the counts in those
 buckets, then have an estimate of the total cached percentage; multiplying
 the two will give you an idea how much of that specific bucket might be in
 memory.  It's not perfect, and you need to incorporate some sort of aging
 method to it (probably weighted average based), but the basic idea could
 work.
 
 Maybe I'm missing something here, but it seems like that would be
 nightmarishly slow.  Every time you read a tuple, you'd have to look
 at every column of the tuple and determine which histogram bucket it
 was in (or, presumably, which MCV it is, since those aren't included
 in working out the histogram buckets).  That seems like it would slow
 down a sequential scan by at least 10x.

You definitely couldn't do it real-time. But you might be able to copy the 
tuple somewhere and have a background process do the analysis.

That said, it might be more productive to know what blocks are available in 
memory and use correlation to guesstimate whether a particular query will need 
hot or cold blocks. Or perhaps we create a different structure that lets you 
track the distribution of each column linearly through the table; something 
more sophisticated than just using correlation perhaps something like 
indicating which stats bucket was most prevalent in each block/range of blocks 
in a table. That information would allow you to estimate exactly what blocks in 
the table you're likely to need...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Robert Haas
On Thu, May 19, 2011 at 2:39 PM, Jim Nasby j...@nasby.net wrote:
 On May 19, 2011, at 9:53 AM, Robert Haas wrote:
 On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Jim Nasby wrote:
 I think the challenge there would be how to define the scope of the
 hot-spot. Is it the last X pages? Last X serial values? Something like
 correlation?

 Hmm... it would be interesting if we had average relation access times for
 each stats bucket on a per-column basis; that would give the planner a
 better idea of how much IO overhead there would be for a given WHERE clause

 You've already given one reasonable first answer to your question here.  If
 you defined a usage counter for each histogram bucket, and incremented that
 each time something from it was touched, that could lead to a very rough way
 to determine access distribution.  Compute a ratio of the counts in those
 buckets, then have an estimate of the total cached percentage; multiplying
 the two will give you an idea how much of that specific bucket might be in
 memory.  It's not perfect, and you need to incorporate some sort of aging
 method to it (probably weighted average based), but the basic idea could
 work.

 Maybe I'm missing something here, but it seems like that would be
 nightmarishly slow.  Every time you read a tuple, you'd have to look
 at every column of the tuple and determine which histogram bucket it
 was in (or, presumably, which MCV it is, since those aren't included
 in working out the histogram buckets).  That seems like it would slow
 down a sequential scan by at least 10x.

 You definitely couldn't do it real-time. But you might be able to copy the 
 tuple somewhere and have a background process do the analysis.

 That said, it might be more productive to know what blocks are available in 
 memory and use correlation to guesstimate whether a particular query will 
 need hot or cold blocks. Or perhaps we create a different structure that lets 
 you track the distribution of each column linearly through the table; 
 something more sophisticated than just using correlation perhaps 
 something like indicating which stats bucket was most prevalent in each 
 block/range of blocks in a table. That information would allow you to 
 estimate exactly what blocks in the table you're likely to need...

Well, all of that stuff sounds impractically expensive to me... but I
just work here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-19 Thread Cédric Villemain
2011/5/19 Jim Nasby j...@nasby.net:
 On May 19, 2011, at 9:53 AM, Robert Haas wrote:
 On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote:
 Jim Nasby wrote:
 I think the challenge there would be how to define the scope of the
 hot-spot. Is it the last X pages? Last X serial values? Something like
 correlation?

 Hmm... it would be interesting if we had average relation access times for
 each stats bucket on a per-column basis; that would give the planner a
 better idea of how much IO overhead there would be for a given WHERE clause

 You've already given one reasonable first answer to your question here.  If
 you defined a usage counter for each histogram bucket, and incremented that
 each time something from it was touched, that could lead to a very rough way
 to determine access distribution.  Compute a ratio of the counts in those
 buckets, then have an estimate of the total cached percentage; multiplying
 the two will give you an idea how much of that specific bucket might be in
 memory.  It's not perfect, and you need to incorporate some sort of aging
 method to it (probably weighted average based), but the basic idea could
 work.

 Maybe I'm missing something here, but it seems like that would be
 nightmarishly slow.  Every time you read a tuple, you'd have to look
 at every column of the tuple and determine which histogram bucket it
 was in (or, presumably, which MCV it is, since those aren't included
 in working out the histogram buckets).  That seems like it would slow
 down a sequential scan by at least 10x.

 You definitely couldn't do it real-time. But you might be able to copy the 
 tuple somewhere and have a background process do the analysis.

 That said, it might be more productive to know what blocks are available in 
 memory and use correlation to guesstimate whether a particular query will 
 need hot or cold blocks. Or perhaps we create a different structure that lets 
 you track the distribution of each column linearly through the table; 
 something more sophisticated than just using correlation perhaps 
 something like indicating which stats bucket was most prevalent in each 
 block/range of blocks in a table. That information would allow you to 
 estimate exactly what blocks in the table you're likely to need...

Those are very good ideas I would get in mind for vacuum/checkpoint
tasks: if you are able to know hot and cold data, then order it in the
segments of the relation. But making it work at the planner level
looks hard. I am not opposed to the idea, but no idea how to do it
right now.

 --
 Jim C. Nasby, Database Architect                   j...@nasby.net
 512.569.9461 (cell)                         http://jim.nasby.net



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-18 Thread Greg Smith

Jim Nasby wrote:

I think the challenge there would be how to define the scope of the hot-spot. 
Is it the last X pages? Last X serial values? Something like correlation?

Hmm... it would be interesting if we had average relation access times for each 
stats bucket on a per-column basis; that would give the planner a better idea 
of how much IO overhead there would be for a given WHERE clause


You've already given one reasonable first answer to your question here.  
If you defined a usage counter for each histogram bucket, and 
incremented that each time something from it was touched, that could 
lead to a very rough way to determine access distribution.  Compute a 
ratio of the counts in those buckets, then have an estimate of the total 
cached percentage; multiplying the two will give you an idea how much of 
that specific bucket might be in memory.  It's not perfect, and you need 
to incorporate some sort of aging method to it (probably weighted 
average based), but the basic idea could work.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-17 Thread Jim Nasby
On May 16, 2011, at 10:46 AM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote:
 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.
 
 Tables can have hot spots, too.  Consider a table that holds calendar
 reservations.  Reservations can be inserted, updated, deleted.  But
 typically, the most recent data will be what is most actively
 modified, and the older data will be relatively more (though not
 completely) static, and less frequently accessed.  Such examples are
 common in many real-world applications.
 
 Yes.  I'm not convinced that measuring the fraction of a table or index
 that's in cache is really going to help us much.  Historical cache hit
 rates might be useful, but only to the extent that the incoming query
 has a similar access pattern to those in the (recent?) past.  It's not
 an easy problem.
 
 I almost wonder if we should not try to measure this at all, but instead
 let the DBA set a per-table or per-index number to use, analogous to the
 override we added recently for column n-distinct statistics ...

I think the challenge there would be how to define the scope of the hot-spot. 
Is it the last X pages? Last X serial values? Something like correlation?

Hmm... it would be interesting if we had average relation access times for each 
stats bucket on a per-column basis; that would give the planner a better idea 
of how much IO overhead there would be for a given WHERE clause.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote:
 To me it seems like a robust and fairly trivial way to to get better
 numbers. The
 fear is that the OS-cache is too much in flux to get any stable numbers
 out
 of it.

 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

Tables can have hot spots, too.  Consider a table that holds calendar
reservations.  Reservations can be inserted, updated, deleted.  But
typically, the most recent data will be what is most actively
modified, and the older data will be relatively more (though not
completely) static, and less frequently accessed.  Such examples are
common in many real-world applications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote:
 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

 Tables can have hot spots, too.  Consider a table that holds calendar
 reservations.  Reservations can be inserted, updated, deleted.  But
 typically, the most recent data will be what is most actively
 modified, and the older data will be relatively more (though not
 completely) static, and less frequently accessed.  Such examples are
 common in many real-world applications.

Yes.  I'm not convinced that measuring the fraction of a table or index
that's in cache is really going to help us much.  Historical cache hit
rates might be useful, but only to the extent that the incoming query
has a similar access pattern to those in the (recent?) past.  It's not
an easy problem.

I almost wonder if we should not try to measure this at all, but instead
let the DBA set a per-table or per-index number to use, analogous to the
override we added recently for column n-distinct statistics ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Jeff Janes
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote:

 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

Why would the index have a meaningful hot-spot unless the underlying
table had one as well?  (Of course the root block will be a hot-spot,
but certainly not 90% of all requests)

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote:
 Ok, it may not work as well with index'es, since having 1% in cache may very
 well mean that 90% of all requested blocks are there.. for tables in should
 be more trivial.

 Why would the index have a meaningful hot-spot unless the underlying
 table had one as well?  (Of course the root block will be a hot-spot,
 but certainly not 90% of all requests)

The accesses to an index are far more likely to be clustered than the
accesses to the underlying table, because the index is organized in a
way that's application-meaningful and the table not so much.  Continuing
the earlier example of a timestamp column, accesses might preferentially
hit near the right end of the index while the underlying rows are all
over the table.

IOW, hot spots measured at the row level and hot spots measured at the
page level could very easily be different between table and index.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Nathan Boley
 The accesses to an index are far more likely to be clustered than the
 accesses to the underlying table, because the index is organized in a
 way that's application-meaningful and the table not so much.

So, to clarify, are you saying that if query were actually requesting
rows uniformly random, then there would be no reason to suspect that
index accesses would have hotspots? It seems like the index structure
( ie, the top node in b-trees ) could also get in the way.

Best,
Nathan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Josh Berkus
Robert,

 All true.  I suspect that in practice the different between random and
 sequential memory page costs is small enough to be ignorable, although
 of course I might be wrong.  

This hasn't been my experience, although I have not carefully measured
it.  In fact, there's good reason to suppose that, if you were selecting
50% of more of a table, sequential access would still be faster even for
an entirely in-memory table.

As a parallel to our development, Redis used to store all data as linked
lists, making every object lookup effectively a random lookup.  They
found that even with a database which is pinned in memory, creating a
data page structure (they call it ziplists) and supporting sequential
scans was up to 10X faster for large lists.

So I would assume that there is still a coefficient difference between
seeks and scans in memory until proven otherwise.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Josh Berkus
Stuart,

 I think random_page_cost causes problems because I need to combine
 disk random access time, which I can measure, with a guesstimate of
 the disk cache hit rate.

See, that's wrong. Disk cache hit rate is what effective_cache_size
(ECS) is for.

Really, there's several factors which should be going into the planner's
estimates to determine a probability of a table being cached:

* ratio between total database size and ECS
* ratio between table size and ECS
* ratio between index size and ECS
* whether the table is hot or not
* whether the index is hot or not

The last two statistics are critically important for good estimation,
and they are not things we currently collect.  By hot I mean: is this
a relation which is accessed several times per minute/hour and is thus
likely to be in the cache when we need it?  Currently, we have no way of
knowing that.

Without hot statistics, we're left with guessing based on size, which
results in bad plans for small tables in large databases which are
accessed infrequently.

Mind you, for large tables it would be even better to go beyond that and
actually have some knowledge of which disk pages might be in cache.
However, I think that's beyond feasibility for current software/OSes.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 2:08 PM, Josh Berkus j...@agliodbs.com wrote:
 All true.  I suspect that in practice the different between random and
 sequential memory page costs is small enough to be ignorable, although
 of course I might be wrong.

 This hasn't been my experience, although I have not carefully measured
 it.  In fact, there's good reason to suppose that, if you were selecting
 50% of more of a table, sequential access would still be faster even for
 an entirely in-memory table.

 As a parallel to our development, Redis used to store all data as linked
 lists, making every object lookup effectively a random lookup.  They
 found that even with a database which is pinned in memory, creating a
 data page structure (they call it ziplists) and supporting sequential
 scans was up to 10X faster for large lists.

 So I would assume that there is still a coefficient difference between
 seeks and scans in memory until proven otherwise.

Well, anything's possible.  But I wonder whether the effects you are
describing might result from a reduction in the *number* of pages
accessed rather than a change in the access pattern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Cédric Villemain
2011/5/15 Josh Berkus j...@agliodbs.com:
 Stuart,

 I think random_page_cost causes problems because I need to combine
 disk random access time, which I can measure, with a guesstimate of
 the disk cache hit rate.

 See, that's wrong. Disk cache hit rate is what effective_cache_size
 (ECS) is for.

 Really, there's several factors which should be going into the planner's
 estimates to determine a probability of a table being cached:

 * ratio between total database size and ECS
 * ratio between table size and ECS
 * ratio between index size and ECS
 * whether the table is hot or not
 * whether the index is hot or not

 The last two statistics are critically important for good estimation,
 and they are not things we currently collect.  By hot I mean: is this
 a relation which is accessed several times per minute/hour and is thus
 likely to be in the cache when we need it?  Currently, we have no way of
 knowing that.

 Without hot statistics, we're left with guessing based on size, which
 results in bad plans for small tables in large databases which are
 accessed infrequently.

 Mind you, for large tables it would be even better to go beyond that and
 actually have some knowledge of which

*which* ?
 do you mean 'area' of the tables ?

 disk pages might be in cache.
 However, I think that's beyond feasibility for current software/OSes.

maybe not :) mincore is available in many OSes, and windows have
options to get those stats too.


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Greg Smith

Craig Ringer wrote:

AFAIK, mincore() is only useful for mmap()ed files and for finding out
if it's safe to access certain blocks of memory w/o risking triggering
heavy swapping.

It doesn't provide any visibility into the OS's block device / file
system caches; you can't ask it how much of this file is cached in RAM
or is this range of blocks in this file cached in RAM.
  


You should try out pgfincore if you think this can't be done!


Even if you could, it's hard to see how an approach that relied on
asking the OS through system calls about the cache state when planning
every query could be fast enough to be viable.
  


You can't do it in real-time.  You don't necessarily want that to even 
if it were possible; too many possibilities for nasty feedback loops 
where you always favor using some marginal index that happens to be in 
memory, and therefore never page in things that would be faster once 
they're read.  The only reasonable implementation that avoids completely 
unstable plans is to scan this data periodically and save some 
statistics on it--the way ANALYZE does--and then have that turn into a 
planner input.


The related secondary idea of just making assumptions about small 
tables/indexes, too, may be a useful heuristic to layer on top of this.  
There's a pile of ideas here that all seem reasonable both in terms of 
modeling real-world behavior and as things that could be inserted into 
the optimizer.  As usual, I suspect that work is needs to be followed by 
a giant testing exercise though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-15 Thread Jesper Krogh

On 2011-05-16 06:41, Jesper Krogh wrote:

On 2011-05-16 03:18, Greg Smith wrote:

 You can't do it in real-time. You don't necessarily want that to
 even if it were possible; too many possibilities for nasty feedback
 loops where you always favor using some marginal index that happens
 to be in memory, and therefore never page in things that would be
 faster once they're read. The only reasonable implementation that
 avoids completely unstable plans is to scan this data periodically
 and save some statistics on it--the way ANALYZE does--and then have
 that turn into a planner input.


Would that be feasible? Have process collecting the data every 
now-and-then

probably picking some conservative-average function and feeding
it into pg_stats for each index/relation?

To me it seems like a robust and fairly trivial way to to get better 
numbers. The
fear is that the OS-cache is too much in flux to get any stable 
numbers out

of it.


Ok, it may not work as well with index'es, since having 1% in cache may very
well mean that 90% of all requested blocks are there.. for tables in should
be more trivial.

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-14 Thread Stuart Bishop
On Sat, May 14, 2011 at 3:13 AM, Josh Berkus j...@agliodbs.com wrote:

 This is what the combination of random_page_cost and
 effective_cache_size ought to supply, but I don't think it does, quite.

I think random_page_cost causes problems because I need to combine
disk random access time, which I can measure, with a guesstimate of
the disk cache hit rate. It would be lovely if these two variables
were separate. It would be even lovelier if the disk cache hit rate
could be probed at run time and didn't need setting at all, but I
suspect that isn't possible on some platforms.

-- 
Stuart Bishop stu...@stuartbishop.net
http://www.stuartbishop.net/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Sok Ann Yap sok...@gmail.com wrote:

 So, index scan wins by a very small margin over sequential scan
 after the tuning. I am a bit puzzled because index scan is more
 than 3000 times faster in this case, but the estimated costs are
 about the same. Did I do something wrong?

 Tuning is generally needed to get best performance from PostgreSQL.
 Needing to reduce random_page_cost is not unusual in situations
 where a good portion of the active data is in cache (between
 shared_buffers and the OS cache).  Please show us your overall
 configuration and give a description of the hardware (how many of
 what kind of cores, how much RAM, what sort of storage system).  The
 configuration part can be obtained by running the query on this page
 and pasting the result into your next post:

 http://wiki.postgresql.org/wiki/Server_Configuration

 There are probably some other configuration adjustments you could do
 to ensure that good plans are chosen.

 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

Actually, effective_cache_size has no impact on costing except when
planning a nested loop with inner index scan.  So, a query against a
single table can never benefit from changing that setting.  Kevin's
suggestion of adjusting seq_page_cost and random_page_cost is the way
to go.

We've talked in the past (and I still think it's a good idea, but
haven't gotten around to doing anything about it) about adjusting the
planner to attribute to each relation the percentage of its pages
which we believe we'll find in cache.  Although many complicated ideas
for determining that percentage have been proposed, my favorite one is
fairly simple: assume that small relations will be mostly or entirely
cached, and that big ones won't be.   Allow the administrator to
override the result on a per-relation basis.  It's difficult to
imagine a situation where the planner should assume that a relation
with only handful of pages isn't going to be cached.  Even if it
isn't, as soon as someone begins accessing it, it will be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 We've talked in the past (and I still think it's a good idea, but
 haven't gotten around to doing anything about it) about adjusting
 the planner to attribute to each relation the percentage of its
 pages which we believe we'll find in cache.  Although many
 complicated ideas for determining that percentage have been
 proposed, my favorite one is fairly simple: assume that small
 relations will be mostly or entirely cached, and that big ones
 won't be.   Allow the administrator to override the result on a
 per-relation basis.  It's difficult to imagine a situation where
 the planner should assume that a relation with only handful of
 pages isn't going to be cached.  Even if it isn't, as soon as
 someone begins accessing it, it will be.
 
Simple as the heuristic is, I bet it would be effective.  While one
can easily construct a synthetic case where it falls down, the ones
I can think of aren't all that common, and you are suggesting an
override mechanism.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

 Actually, effective_cache_size has no impact on costing except when
 planning a nested loop with inner index scan.  So, a query against a
 single table can never benefit from changing that setting.

That's flat out wrong.  It does affect the cost estimate for plain
indexscan (and bitmap indexscan) plans.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 3:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote:
 The very first thing to check is effective_cache_size and to set it to
 a reasonable value.

 Actually, effective_cache_size has no impact on costing except when
 planning a nested loop with inner index scan.  So, a query against a
 single table can never benefit from changing that setting.

 That's flat out wrong.  It does affect the cost estimate for plain
 indexscan (and bitmap indexscan) plans.

rereads code

OK, I agree.  I obviously misinterpreted this code the last time I read it.

I guess maybe the reason why it didn't matter for the OP is that - if
the size of the index page in pages is smaller than the pro-rated
fraction of effective_cache_size allowed to the index - then the exact
value doesn't affect the answer.

I apparently need to study this code more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Josh Berkus

 I guess maybe the reason why it didn't matter for the OP is that - if
 the size of the index page in pages is smaller than the pro-rated
 fraction of effective_cache_size allowed to the index - then the exact
 value doesn't affect the answer.
 
 I apparently need to study this code more.

FWIW: random_page_cost is meant to be the ratio between the cost of
looking up a single row as and index lookup, and the cost of looking up
that same row as part of a larger sequential scan.  For specific
storage, that coefficient should be roughly the same regardless of the
table size.  So if your plan for optimization involves manipulating RPC
for anything other than a change of storage, you're Doing It Wrong.

Instead, we should be fixing the formulas these are based on and leaving
RPC alone.

For any data page, there are actually four costs associated with each
tuple lookup, per:

in-memory/seq   | on disk/seq
+
in-memory/random| on disk/random

(yes, there's actually more for bitmapscan etc.  but the example holds)

For any given tuple lookup, then, you can assign a cost based on where
you think that tuple falls in that quadrant map.  Since this is all
probability-based, you'd be assigning a cost as a mixed % of in-memory
and on-disk costs.  Improvements in accuracy of this formula would come
through improvements in accuracy in predicting if a particular data page
will be in memory.

This is what the combination of random_page_cost and
effective_cache_size ought to supply, but I don't think it does, quite.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Cédric Villemain
2011/5/13 Josh Berkus j...@agliodbs.com:

 I guess maybe the reason why it didn't matter for the OP is that - if
 the size of the index page in pages is smaller than the pro-rated
 fraction of effective_cache_size allowed to the index - then the exact
 value doesn't affect the answer.

 I apparently need to study this code more.

 FWIW: random_page_cost is meant to be the ratio between the cost of
 looking up a single row as and index lookup, and the cost of looking up
 that same row as part of a larger sequential scan.  For specific
 storage, that coefficient should be roughly the same regardless of the
 table size.  So if your plan for optimization involves manipulating RPC
 for anything other than a change of storage, you're Doing It Wrong.

 Instead, we should be fixing the formulas these are based on and leaving
 RPC alone.

 For any data page, there are actually four costs associated with each
 tuple lookup, per:

 in-memory/seq   | on disk/seq
 +
 in-memory/random| on disk/random

it lacks some more theorical like sort_page/temp_page : those are
based on a ratio of seq_page_cost and random_page_cost or a simple
seq_page_cost (when working out of work_mem)

memory access is accounted with some 0.1 in some place AFAIR.
(and memory random/seq is the same at the level of estimations we do)


 (yes, there's actually more for bitmapscan etc.  but the example holds)

(if I read correctly the sources, for this one there is a linear
approach to ponderate the cost between random_page cost and
seq_page_cost on the heap page fetch plus the Mackert and Lohman
formula, if needed, in its best usage : predicting what should be in
cache *because* of the current query execution, not because of the
current status of the page cache)


 For any given tuple lookup, then, you can assign a cost based on where
 you think that tuple falls in that quadrant map.  Since this is all
 probability-based, you'd be assigning a cost as a mixed % of in-memory
 and on-disk costs.  Improvements in accuracy of this formula would come
 through improvements in accuracy in predicting if a particular data page
 will be in memory.

 This is what the combination of random_page_cost and
 effective_cache_size ought to supply, but I don't think it does, quite.

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-13 Thread Robert Haas
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus j...@agliodbs.com wrote:
 Instead, we should be fixing the formulas these are based on and leaving
 RPC alone.

 For any data page, there are actually four costs associated with each
 tuple lookup, per:

All true.  I suspect that in practice the different between random and
sequential memory page costs is small enough to be ignorable, although
of course I might be wrong.  I've never seen a database that was fully
cached in memory where it was necessary to set
random_page_costseq_page_cost to get good plans -- no doubt partly
because even if the pages were consecutive on disk, there's no reason
to suppose they would be so in memory, and we certainly wouldn't know
one way or the other at planning time.   But I agree we should add a
cached_page_cost as part of all this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Sok Ann Yap
On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Sok Ann Yap sok...@gmail.com wrote:

 Anyway, the overhead of spawning 44 extra queries means that it is
 still better off for me to stick with the original query and tune
 PostgreSQL to choose index scan.

 Maybe, but what is *best* for you is to tune PostgreSQL so that your
 costs are accurately modeled, at which point it will automatically
 pick the best plan for most or all of your queries without you
 needing to worry about it.

 If you set your effective_cache_size to the sum of shared_buffers
 and what your OS reports as cache after you've been running a while,
 that will help the optimizer know what size index fits in RAM, and
 will tend to encourage index use.  If the active portion of your
 data is heavily cached, you might want to set random_page_cost and
 seq_page_cost to the same value, and make that value somewhere in
 the 0.1 to 0.05 range.  If you have moderate caching, using 1 and 2
 can be good.

 If you're still not getting reasonable plans, please post again with
 more information about your hardware along with the query and its
 EXPLAIN ANALYZE output.

 -Kevin


I understand the need to tune PostgreSQL properly for my use case.
What I am curious about is, for the data set I have, under what
circumstances (hardware/workload/cache status/etc) would a sequential
scan really be faster than an index scan for that particular query?

To simulate a scenario when nothing is cached, I stopped PostgreSQL,
dropped all system cache (sync; echo 3  /proc/sys/vm/drop_caches),
restarted PostgreSQL, and ran the query. A sequential scan run took
13.70 seconds, while an index scan run took 0.34 seconds, which is
still 40 times faster.

Also, I tried increasing effective_cache_size from 512MB to 3GB (the
database size is 2+GB), and it still favor sequential scan. The
estimated costs did not change at all.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Sok Ann Yap
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Sok Ann Yap  wrote:
 Kevin Grittner  wrote:

 Please show us your overall configuration and give a description
 of the hardware (how many of what kind of cores, how much RAM,
 what sort of storage system).

 Here's the configuration (this is just a low end laptop):

 version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
 compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
 pie-0.4.5) 4.5.2, 64-bit
 checkpoint_segments | 16
 default_statistics_target | 1

 Usually overkill.  If this didn't help, you should probably change it
 back.

 effective_cache_size | 512MB
 lc_collate | en_US.UTF-8
 lc_ctype | en_US.UTF-8
 listen_addresses | *
 log_destination | syslog
 log_min_duration_statement | 0
 maintenance_work_mem | 256MB
 max_connections | 100

 You probably don't need this many connections.

 max_stack_depth | 2MB
 port | 5432
 random_page_cost | 4
 server_encoding | UTF8
 shared_buffers | 256MB
 silent_mode | on
 TimeZone | Asia/Kuala_Lumpur
 wal_buffers | 1MB
 work_mem | 32MB
 (20 rows)

 It's hard to recommend other changes without knowing the RAM on the
 system.  How many of what kind of CPUs would help, too.

 The thing is, the query I posted was fairly simple (I think), and
 PostgreSQL should be able to choose the 3000+ times faster index
 scan with the default random_page_cost of 4.

 It picks the plan with the lowest estimated cost.  If it's not
 picking the best plan, that's usually an indication that you need to
 adjust cost factors so that estimates better model the actual costs.

 If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
 is random_page_cost = 4 good for?

 It's good for large databases with a lot of physical disk I/O.  In
 fact, in some of those cases, it needs to be higher.  In your test,
 the numbers indicate that everything was cached in RAM.  That makes
 the effective cost very low.

 Also, the odds are that you have more total cache space between the
 shared_buffers and the OS cache than the effective_cache_size
 setting, so the optimizer doesn't expect the number of cache hits
 you're getting on index usage.

 -Kevin


Thanks for the tips and explanation. I wrongly assumed the
random_page_cost value is independent from caching.

Now, let's go back to the original query:

SELECT
salutations.id,
salutations.name,
EXISTS (
SELECT 1
FROM contacts
WHERE salutations.id = contacts.salutation_id
) AS in_use
FROM salutations

If I split up the query, i.e. running this once:

SELECT
salutations.id,
salutations.name
FROM salutations

and then running this 44 times, once for each row:

SELECT
EXISTS (
SELECT 1
FROM contacts
WHERE contacts.salutation_id = ?
) AS in_use

I can see that PostgreSQL will smartly pick the best plan, i.e. for
common salutations (Madam, Ms, etc), it will do sequential scan, while
for salutations that are rarely used or not used at all, it will do
index scan.

Anyway, the overhead of spawning 44 extra queries means that it is
still better off for me to stick with the original query and tune
PostgreSQL to choose index scan.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Jeff Janes
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap sok...@gmail.com wrote:

 I understand the need to tune PostgreSQL properly for my use case.
 What I am curious about is, for the data set I have, under what
 circumstances (hardware/workload/cache status/etc) would a sequential
 scan really be faster than an index scan for that particular query?


The sequential scan on contacts can be terminated as soon as the first
matching row is found.  If each block of the contacts table contains
one example of each salutation, then the inner sequential scan will
always be very short, and faster than an index scan.

I can engineer this to be the case by populating the table like this:

insert into contacts select (generate_series%44+1)::int from
generate_series (1,100);

Here I get the seq scan being 2.6ms while the index scan is 5.6ms.

Predicting how far the inner scan needs to go would be quite
difficult, and I don't know how the system will do it.

However, when I create and populate simple tables based on your
description, I get the index scan being the lower estimated cost.  So
the tables I built are not sufficient to study the matter in detail.



Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Kevin Grittner
Sok Ann Yap  wrote:
 Kevin Grittner  wrote:
 
 Please show us your overall configuration and give a description
 of the hardware (how many of what kind of cores, how much RAM,
 what sort of storage system).
 
 Here's the configuration (this is just a low end laptop):
 
 version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu,
 compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0,
 pie-0.4.5) 4.5.2, 64-bit
 checkpoint_segments | 16
 default_statistics_target | 1
 
Usually overkill.  If this didn't help, you should probably change it
back.
 
 effective_cache_size | 512MB
 lc_collate | en_US.UTF-8
 lc_ctype | en_US.UTF-8
 listen_addresses | *
 log_destination | syslog
 log_min_duration_statement | 0
 maintenance_work_mem | 256MB
 max_connections | 100
 
You probably don't need this many connections.
 
 max_stack_depth | 2MB
 port | 5432
 random_page_cost | 4
 server_encoding | UTF8
 shared_buffers | 256MB
 silent_mode | on
 TimeZone | Asia/Kuala_Lumpur
 wal_buffers | 1MB
 work_mem | 32MB
 (20 rows)
 
It's hard to recommend other changes without knowing the RAM on the
system.  How many of what kind of CPUs would help, too.
 
 The thing is, the query I posted was fairly simple (I think), and
 PostgreSQL should be able to choose the 3000+ times faster index
 scan with the default random_page_cost of 4.
 
It picks the plan with the lowest estimated cost.  If it's not
picking the best plan, that's usually an indication that you need to
adjust cost factors so that estimates better model the actual costs.
 
 If I need to reduce it to 2 when using a 5.4k rpm slow disk, what
 is random_page_cost = 4 good for?
 
It's good for large databases with a lot of physical disk I/O.  In
fact, in some of those cases, it needs to be higher.  In your test,
the numbers indicate that everything was cached in RAM.  That makes
the effective cost very low.
 
Also, the odds are that you have more total cache space between the
shared_buffers and the OS cache than the effective_cache_size
setting, so the optimizer doesn't expect the number of cache hits
you're getting on index usage.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-27 Thread Kevin Grittner
Sok Ann Yap sok...@gmail.com wrote:
 
 Anyway, the overhead of spawning 44 extra queries means that it is
 still better off for me to stick with the original query and tune
 PostgreSQL to choose index scan.
 
Maybe, but what is *best* for you is to tune PostgreSQL so that your
costs are accurately modeled, at which point it will automatically
pick the best plan for most or all of your queries without you
needing to worry about it.
 
If you set your effective_cache_size to the sum of shared_buffers
and what your OS reports as cache after you've been running a while,
that will help the optimizer know what size index fits in RAM, and
will tend to encourage index use.  If the active portion of your
data is heavily cached, you might want to set random_page_cost and
seq_page_cost to the same value, and make that value somewhere in
the 0.1 to 0.05 range.  If you have moderate caching, using 1 and 2
can be good.
 
If you're still not getting reasonable plans, please post again with
more information about your hardware along with the query and its
EXPLAIN ANALYZE output.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-26 Thread Sok Ann Yap
Hi,

I am using PostgreSQL 9.0. There is a salutations table with 44 rows,
and a contacts table with more than a million rows. The contacts table
has a nullable (only 0.002% null) salutation_id column, referencing
salutations.id.

With this query:

SELECT
salutations.id,
salutations.name,
salutations.description,
EXISTS (
SELECT 1
FROM contacts
WHERE salutations.id = contacts.salutation_id
) AS in_use
FROM salutations

I have to reduce random_page_cost from 4 to 2 to force index scan.

EXPLAIN ANALYSE output with random_page_cost = 4:

 Seq Scan on salutations  (cost=0.00..50.51 rows=44 width=229) (actual
time=0.188..3844.037 rows=44 loops=1)
   SubPlan 1
 -  Seq Scan on contacts  (cost=0.00..64578.41 rows=57906
width=0) (actual time=87.358..87.358 rows=1 loops=44)
   Filter: ($0 = salutation_id)
 Total runtime: 3844.113 ms

EXPLAIN ANALYSE output with random_page_cost = 4, enable_seqscan = 0:

 Seq Scan on salutations  (cost=100.00..195.42 rows=44
width=229) (actual time=0.053..0.542 rows=44 loops=1)
   SubPlan 1
 -  Index Scan using ix_contacts_salutation_id on contacts
(cost=0.00..123682.07 rows=57906 width=0) (actual time=0.011..0.011
rows=1 loops=44)
   Index Cond: ($0 = salutation_id)
 Total runtime: 0.592 ms

EXPLAIN ANALYSE output with random_page_cost = 2:

 Seq Scan on salutations  (cost=0.00..48.87 rows=44 width=229) (actual
time=0.053..0.541 rows=44 loops=1)
   SubPlan 1
 -  Index Scan using ix_contacts_salutation_id on contacts
(cost=0.00..62423.45 rows=57906 width=0) (actual time=0.011..0.011
rows=1 loops=44)
   Index Cond: ($0 = salutation_id)
 Total runtime: 0.594 ms

So, index scan wins by a very small margin over sequential scan after
the tuning. I am a bit puzzled because index scan is more than 3000
times faster in this case, but the estimated costs are about the same.
Did I do something wrong?

Regards,
Yap

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-26 Thread Kevin Grittner
Sok Ann Yap sok...@gmail.com wrote:
 
 So, index scan wins by a very small margin over sequential scan
 after the tuning. I am a bit puzzled because index scan is more
 than 3000 times faster in this case, but the estimated costs are
 about the same. Did I do something wrong?
 
Tuning is generally needed to get best performance from PostgreSQL. 
Needing to reduce random_page_cost is not unusual in situations
where a good portion of the active data is in cache (between
shared_buffers and the OS cache).  Please show us your overall
configuration and give a description of the hardware (how many of
what kind of cores, how much RAM, what sort of storage system).  The
configuration part can be obtained by running the query on this page
and pasting the result into your next post:
 
http://wiki.postgresql.org/wiki/Server_Configuration
 
There are probably some other configuration adjustments you could do
to ensure that good plans are chosen.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance