Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Hello. As of me, all this hot thing really looks like uncertain and dynamic enough. Two things that I could directly use right now (and they are needed in pair) are: 1)Per-table/index/database bufferpools (split shared buffer into parts, allow to specify which index/table/database goes where) 2)Per-table/index cost settings If I had this, I could allocate specific bufferpools for tables/indexes that MUST be hot in memory and set low costs for this specific tables. P.S. Third thing, great to have to companion this two is Load on startup flag to automatically populate bufferpools with fast sequential read, but this can be easily emulated with a statement. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Well, all of that stuff sounds impractically expensive to me... but I just work here. I'll point out that the simple version, which just checks for hot tables and indexes, would improve estimates greatly and be a LOT less complicated than these proposals. Certainly having some form of block-based or range-based stats would be better, but it also sounds hard enough to maybe never get done. Having user-accessible hot stats would also be useful to DBAs. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Mon, May 23, 2011 at 3:08 PM, Josh Berkus j...@agliodbs.com wrote: Well, all of that stuff sounds impractically expensive to me... but I just work here. I'll point out that the simple version, which just checks for hot tables and indexes, would improve estimates greatly and be a LOT less complicated than these proposals. I realize I'm sounding like a broken record here, but as far as I can tell there is absolutely zero evidence that that would be better. I'm sure you're in good company thinking so, but the list of things that could skew (or should I say, screw) the estimates is long and painful; and if those estimates are wrong, you'll end up with something that is both worse and less predictable than the status quo. First, I haven't seen a shred of hard evidence that the contents of the buffer cache or OS cache are stable enough to be relied upon, and we've repeatedly discussed workloads where that might not be true. Has anyone done a systematic study of this on a variety real production systems? If so, the results haven't been posted here, at least not that I can recall. Second, even if we were willing to accept that we could obtain relatively stable and accurate measurements of this data, who is to say that basing plans on it would actually result in an improvement in plan quality? That may seem obvious, but I don't think it is. The proposed method is a bit like trying to determine the altitude of a hot air balloon by throwing the ballast over the side and timing how long it takes to hit the ground. Executing plans that are based on the contents of the cache will change the contents of the cache, which will in turn change the plans. The idea that we can know, without any experimentation, how that's going to shake out, seems to me to be an exercise in unjustified optimism of the first order. Sorry to sound grumpy and pessimistic, but I really think we're letting our enthusiasm get way, way ahead of the evidence. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote: Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. Maybe I'm missing something here, but it seems like that would be nightmarishly slow. Every time you read a tuple, you'd have to look at every column of the tuple and determine which histogram bucket it was in (or, presumably, which MCV it is, since those aren't included in working out the histogram buckets). That seems like it would slow down a sequential scan by at least 10x. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On May 19, 2011, at 9:53 AM, Robert Haas wrote: On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote: Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. Maybe I'm missing something here, but it seems like that would be nightmarishly slow. Every time you read a tuple, you'd have to look at every column of the tuple and determine which histogram bucket it was in (or, presumably, which MCV it is, since those aren't included in working out the histogram buckets). That seems like it would slow down a sequential scan by at least 10x. You definitely couldn't do it real-time. But you might be able to copy the tuple somewhere and have a background process do the analysis. That said, it might be more productive to know what blocks are available in memory and use correlation to guesstimate whether a particular query will need hot or cold blocks. Or perhaps we create a different structure that lets you track the distribution of each column linearly through the table; something more sophisticated than just using correlation perhaps something like indicating which stats bucket was most prevalent in each block/range of blocks in a table. That information would allow you to estimate exactly what blocks in the table you're likely to need... -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Thu, May 19, 2011 at 2:39 PM, Jim Nasby j...@nasby.net wrote: On May 19, 2011, at 9:53 AM, Robert Haas wrote: On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote: Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. Maybe I'm missing something here, but it seems like that would be nightmarishly slow. Every time you read a tuple, you'd have to look at every column of the tuple and determine which histogram bucket it was in (or, presumably, which MCV it is, since those aren't included in working out the histogram buckets). That seems like it would slow down a sequential scan by at least 10x. You definitely couldn't do it real-time. But you might be able to copy the tuple somewhere and have a background process do the analysis. That said, it might be more productive to know what blocks are available in memory and use correlation to guesstimate whether a particular query will need hot or cold blocks. Or perhaps we create a different structure that lets you track the distribution of each column linearly through the table; something more sophisticated than just using correlation perhaps something like indicating which stats bucket was most prevalent in each block/range of blocks in a table. That information would allow you to estimate exactly what blocks in the table you're likely to need... Well, all of that stuff sounds impractically expensive to me... but I just work here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
2011/5/19 Jim Nasby j...@nasby.net: On May 19, 2011, at 9:53 AM, Robert Haas wrote: On Wed, May 18, 2011 at 11:00 PM, Greg Smith g...@2ndquadrant.com wrote: Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. Maybe I'm missing something here, but it seems like that would be nightmarishly slow. Every time you read a tuple, you'd have to look at every column of the tuple and determine which histogram bucket it was in (or, presumably, which MCV it is, since those aren't included in working out the histogram buckets). That seems like it would slow down a sequential scan by at least 10x. You definitely couldn't do it real-time. But you might be able to copy the tuple somewhere and have a background process do the analysis. That said, it might be more productive to know what blocks are available in memory and use correlation to guesstimate whether a particular query will need hot or cold blocks. Or perhaps we create a different structure that lets you track the distribution of each column linearly through the table; something more sophisticated than just using correlation perhaps something like indicating which stats bucket was most prevalent in each block/range of blocks in a table. That information would allow you to estimate exactly what blocks in the table you're likely to need... Those are very good ideas I would get in mind for vacuum/checkpoint tasks: if you are able to know hot and cold data, then order it in the segments of the relation. But making it work at the planner level looks hard. I am not opposed to the idea, but no idea how to do it right now. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Jim Nasby wrote: I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause You've already given one reasonable first answer to your question here. If you defined a usage counter for each histogram bucket, and incremented that each time something from it was touched, that could lead to a very rough way to determine access distribution. Compute a ratio of the counts in those buckets, then have an estimate of the total cached percentage; multiplying the two will give you an idea how much of that specific bucket might be in memory. It's not perfect, and you need to incorporate some sort of aging method to it (probably weighted average based), but the basic idea could work. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On May 16, 2011, at 10:46 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Tables can have hot spots, too. Consider a table that holds calendar reservations. Reservations can be inserted, updated, deleted. But typically, the most recent data will be what is most actively modified, and the older data will be relatively more (though not completely) static, and less frequently accessed. Such examples are common in many real-world applications. Yes. I'm not convinced that measuring the fraction of a table or index that's in cache is really going to help us much. Historical cache hit rates might be useful, but only to the extent that the incoming query has a similar access pattern to those in the (recent?) past. It's not an easy problem. I almost wonder if we should not try to measure this at all, but instead let the DBA set a per-table or per-index number to use, analogous to the override we added recently for column n-distinct statistics ... I think the challenge there would be how to define the scope of the hot-spot. Is it the last X pages? Last X serial values? Something like correlation? Hmm... it would be interesting if we had average relation access times for each stats bucket on a per-column basis; that would give the planner a better idea of how much IO overhead there would be for a given WHERE clause. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote: To me it seems like a robust and fairly trivial way to to get better numbers. The fear is that the OS-cache is too much in flux to get any stable numbers out of it. Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Tables can have hot spots, too. Consider a table that holds calendar reservations. Reservations can be inserted, updated, deleted. But typically, the most recent data will be what is most actively modified, and the older data will be relatively more (though not completely) static, and less frequently accessed. Such examples are common in many real-world applications. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com writes: On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Tables can have hot spots, too. Consider a table that holds calendar reservations. Reservations can be inserted, updated, deleted. But typically, the most recent data will be what is most actively modified, and the older data will be relatively more (though not completely) static, and less frequently accessed. Such examples are common in many real-world applications. Yes. I'm not convinced that measuring the fraction of a table or index that's in cache is really going to help us much. Historical cache hit rates might be useful, but only to the extent that the incoming query has a similar access pattern to those in the (recent?) past. It's not an easy problem. I almost wonder if we should not try to measure this at all, but instead let the DBA set a per-table or per-index number to use, analogous to the override we added recently for column n-distinct statistics ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Why would the index have a meaningful hot-spot unless the underlying table had one as well? (Of course the root block will be a hot-spot, but certainly not 90% of all requests) Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Jeff Janes jeff.ja...@gmail.com writes: On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh jes...@krogh.cc wrote: Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. Why would the index have a meaningful hot-spot unless the underlying table had one as well? (Of course the root block will be a hot-spot, but certainly not 90% of all requests) The accesses to an index are far more likely to be clustered than the accesses to the underlying table, because the index is organized in a way that's application-meaningful and the table not so much. Continuing the earlier example of a timestamp column, accesses might preferentially hit near the right end of the index while the underlying rows are all over the table. IOW, hot spots measured at the row level and hot spots measured at the page level could very easily be different between table and index. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
The accesses to an index are far more likely to be clustered than the accesses to the underlying table, because the index is organized in a way that's application-meaningful and the table not so much. So, to clarify, are you saying that if query were actually requesting rows uniformly random, then there would be no reason to suspect that index accesses would have hotspots? It seems like the index structure ( ie, the top node in b-trees ) could also get in the way. Best, Nathan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert, All true. I suspect that in practice the different between random and sequential memory page costs is small enough to be ignorable, although of course I might be wrong. This hasn't been my experience, although I have not carefully measured it. In fact, there's good reason to suppose that, if you were selecting 50% of more of a table, sequential access would still be faster even for an entirely in-memory table. As a parallel to our development, Redis used to store all data as linked lists, making every object lookup effectively a random lookup. They found that even with a database which is pinned in memory, creating a data page structure (they call it ziplists) and supporting sequential scans was up to 10X faster for large lists. So I would assume that there is still a coefficient difference between seeks and scans in memory until proven otherwise. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Stuart, I think random_page_cost causes problems because I need to combine disk random access time, which I can measure, with a guesstimate of the disk cache hit rate. See, that's wrong. Disk cache hit rate is what effective_cache_size (ECS) is for. Really, there's several factors which should be going into the planner's estimates to determine a probability of a table being cached: * ratio between total database size and ECS * ratio between table size and ECS * ratio between index size and ECS * whether the table is hot or not * whether the index is hot or not The last two statistics are critically important for good estimation, and they are not things we currently collect. By hot I mean: is this a relation which is accessed several times per minute/hour and is thus likely to be in the cache when we need it? Currently, we have no way of knowing that. Without hot statistics, we're left with guessing based on size, which results in bad plans for small tables in large databases which are accessed infrequently. Mind you, for large tables it would be even better to go beyond that and actually have some knowledge of which disk pages might be in cache. However, I think that's beyond feasibility for current software/OSes. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Sun, May 15, 2011 at 2:08 PM, Josh Berkus j...@agliodbs.com wrote: All true. I suspect that in practice the different between random and sequential memory page costs is small enough to be ignorable, although of course I might be wrong. This hasn't been my experience, although I have not carefully measured it. In fact, there's good reason to suppose that, if you were selecting 50% of more of a table, sequential access would still be faster even for an entirely in-memory table. As a parallel to our development, Redis used to store all data as linked lists, making every object lookup effectively a random lookup. They found that even with a database which is pinned in memory, creating a data page structure (they call it ziplists) and supporting sequential scans was up to 10X faster for large lists. So I would assume that there is still a coefficient difference between seeks and scans in memory until proven otherwise. Well, anything's possible. But I wonder whether the effects you are describing might result from a reduction in the *number* of pages accessed rather than a change in the access pattern. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
2011/5/15 Josh Berkus j...@agliodbs.com: Stuart, I think random_page_cost causes problems because I need to combine disk random access time, which I can measure, with a guesstimate of the disk cache hit rate. See, that's wrong. Disk cache hit rate is what effective_cache_size (ECS) is for. Really, there's several factors which should be going into the planner's estimates to determine a probability of a table being cached: * ratio between total database size and ECS * ratio between table size and ECS * ratio between index size and ECS * whether the table is hot or not * whether the index is hot or not The last two statistics are critically important for good estimation, and they are not things we currently collect. By hot I mean: is this a relation which is accessed several times per minute/hour and is thus likely to be in the cache when we need it? Currently, we have no way of knowing that. Without hot statistics, we're left with guessing based on size, which results in bad plans for small tables in large databases which are accessed infrequently. Mind you, for large tables it would be even better to go beyond that and actually have some knowledge of which *which* ? do you mean 'area' of the tables ? disk pages might be in cache. However, I think that's beyond feasibility for current software/OSes. maybe not :) mincore is available in many OSes, and windows have options to get those stats too. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Craig Ringer wrote: AFAIK, mincore() is only useful for mmap()ed files and for finding out if it's safe to access certain blocks of memory w/o risking triggering heavy swapping. It doesn't provide any visibility into the OS's block device / file system caches; you can't ask it how much of this file is cached in RAM or is this range of blocks in this file cached in RAM. You should try out pgfincore if you think this can't be done! Even if you could, it's hard to see how an approach that relied on asking the OS through system calls about the cache state when planning every query could be fast enough to be viable. You can't do it in real-time. You don't necessarily want that to even if it were possible; too many possibilities for nasty feedback loops where you always favor using some marginal index that happens to be in memory, and therefore never page in things that would be faster once they're read. The only reasonable implementation that avoids completely unstable plans is to scan this data periodically and save some statistics on it--the way ANALYZE does--and then have that turn into a planner input. The related secondary idea of just making assumptions about small tables/indexes, too, may be a useful heuristic to layer on top of this. There's a pile of ideas here that all seem reasonable both in terms of modeling real-world behavior and as things that could be inserted into the optimizer. As usual, I suspect that work is needs to be followed by a giant testing exercise though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On 2011-05-16 06:41, Jesper Krogh wrote: On 2011-05-16 03:18, Greg Smith wrote: You can't do it in real-time. You don't necessarily want that to even if it were possible; too many possibilities for nasty feedback loops where you always favor using some marginal index that happens to be in memory, and therefore never page in things that would be faster once they're read. The only reasonable implementation that avoids completely unstable plans is to scan this data periodically and save some statistics on it--the way ANALYZE does--and then have that turn into a planner input. Would that be feasible? Have process collecting the data every now-and-then probably picking some conservative-average function and feeding it into pg_stats for each index/relation? To me it seems like a robust and fairly trivial way to to get better numbers. The fear is that the OS-cache is too much in flux to get any stable numbers out of it. Ok, it may not work as well with index'es, since having 1% in cache may very well mean that 90% of all requested blocks are there.. for tables in should be more trivial. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Sat, May 14, 2011 at 3:13 AM, Josh Berkus j...@agliodbs.com wrote: This is what the combination of random_page_cost and effective_cache_size ought to supply, but I don't think it does, quite. I think random_page_cost causes problems because I need to combine disk random access time, which I can measure, with a guesstimate of the disk cache hit rate. It would be lovely if these two variables were separate. It would be even lovelier if the disk cache hit rate could be probed at run time and didn't need setting at all, but I suspect that isn't possible on some platforms. -- Stuart Bishop stu...@stuartbishop.net http://www.stuartbishop.net/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Apr 26, 2011 at 4:37 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap sok...@gmail.com wrote: So, index scan wins by a very small margin over sequential scan after the tuning. I am a bit puzzled because index scan is more than 3000 times faster in this case, but the estimated costs are about the same. Did I do something wrong? Tuning is generally needed to get best performance from PostgreSQL. Needing to reduce random_page_cost is not unusual in situations where a good portion of the active data is in cache (between shared_buffers and the OS cache). Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). The configuration part can be obtained by running the query on this page and pasting the result into your next post: http://wiki.postgresql.org/wiki/Server_Configuration There are probably some other configuration adjustments you could do to ensure that good plans are chosen. The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. Kevin's suggestion of adjusting seq_page_cost and random_page_cost is the way to go. We've talked in the past (and I still think it's a good idea, but haven't gotten around to doing anything about it) about adjusting the planner to attribute to each relation the percentage of its pages which we believe we'll find in cache. Although many complicated ideas for determining that percentage have been proposed, my favorite one is fairly simple: assume that small relations will be mostly or entirely cached, and that big ones won't be. Allow the administrator to override the result on a per-relation basis. It's difficult to imagine a situation where the planner should assume that a relation with only handful of pages isn't going to be cached. Even if it isn't, as soon as someone begins accessing it, it will be. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com wrote: We've talked in the past (and I still think it's a good idea, but haven't gotten around to doing anything about it) about adjusting the planner to attribute to each relation the percentage of its pages which we believe we'll find in cache. Although many complicated ideas for determining that percentage have been proposed, my favorite one is fairly simple: assume that small relations will be mostly or entirely cached, and that big ones won't be. Allow the administrator to override the result on a per-relation basis. It's difficult to imagine a situation where the planner should assume that a relation with only handful of pages isn't going to be cached. Even if it isn't, as soon as someone begins accessing it, it will be. Simple as the heuristic is, I bet it would be effective. While one can easily construct a synthetic case where it falls down, the ones I can think of aren't all that common, and you are suggesting an override mechanism. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Robert Haas robertmh...@gmail.com writes: On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. That's flat out wrong. It does affect the cost estimate for plain indexscan (and bitmap indexscan) plans. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Fri, May 13, 2011 at 3:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Apr 26, 2011 at 9:04 PM, Merlin Moncure mmonc...@gmail.com wrote: The very first thing to check is effective_cache_size and to set it to a reasonable value. Actually, effective_cache_size has no impact on costing except when planning a nested loop with inner index scan. So, a query against a single table can never benefit from changing that setting. That's flat out wrong. It does affect the cost estimate for plain indexscan (and bitmap indexscan) plans. rereads code OK, I agree. I obviously misinterpreted this code the last time I read it. I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. FWIW: random_page_cost is meant to be the ratio between the cost of looking up a single row as and index lookup, and the cost of looking up that same row as part of a larger sequential scan. For specific storage, that coefficient should be roughly the same regardless of the table size. So if your plan for optimization involves manipulating RPC for anything other than a change of storage, you're Doing It Wrong. Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: in-memory/seq | on disk/seq + in-memory/random| on disk/random (yes, there's actually more for bitmapscan etc. but the example holds) For any given tuple lookup, then, you can assign a cost based on where you think that tuple falls in that quadrant map. Since this is all probability-based, you'd be assigning a cost as a mixed % of in-memory and on-disk costs. Improvements in accuracy of this formula would come through improvements in accuracy in predicting if a particular data page will be in memory. This is what the combination of random_page_cost and effective_cache_size ought to supply, but I don't think it does, quite. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
2011/5/13 Josh Berkus j...@agliodbs.com: I guess maybe the reason why it didn't matter for the OP is that - if the size of the index page in pages is smaller than the pro-rated fraction of effective_cache_size allowed to the index - then the exact value doesn't affect the answer. I apparently need to study this code more. FWIW: random_page_cost is meant to be the ratio between the cost of looking up a single row as and index lookup, and the cost of looking up that same row as part of a larger sequential scan. For specific storage, that coefficient should be roughly the same regardless of the table size. So if your plan for optimization involves manipulating RPC for anything other than a change of storage, you're Doing It Wrong. Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: in-memory/seq | on disk/seq + in-memory/random| on disk/random it lacks some more theorical like sort_page/temp_page : those are based on a ratio of seq_page_cost and random_page_cost or a simple seq_page_cost (when working out of work_mem) memory access is accounted with some 0.1 in some place AFAIR. (and memory random/seq is the same at the level of estimations we do) (yes, there's actually more for bitmapscan etc. but the example holds) (if I read correctly the sources, for this one there is a linear approach to ponderate the cost between random_page cost and seq_page_cost on the heap page fetch plus the Mackert and Lohman formula, if needed, in its best usage : predicting what should be in cache *because* of the current query execution, not because of the current status of the page cache) For any given tuple lookup, then, you can assign a cost based on where you think that tuple falls in that quadrant map. Since this is all probability-based, you'd be assigning a cost as a mixed % of in-memory and on-disk costs. Improvements in accuracy of this formula would come through improvements in accuracy in predicting if a particular data page will be in memory. This is what the combination of random_page_cost and effective_cache_size ought to supply, but I don't think it does, quite. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Fri, May 13, 2011 at 4:13 PM, Josh Berkus j...@agliodbs.com wrote: Instead, we should be fixing the formulas these are based on and leaving RPC alone. For any data page, there are actually four costs associated with each tuple lookup, per: All true. I suspect that in practice the different between random and sequential memory page costs is small enough to be ignorable, although of course I might be wrong. I've never seen a database that was fully cached in memory where it was necessary to set random_page_costseq_page_cost to get good plans -- no doubt partly because even if the pages were consecutive on disk, there's no reason to suppose they would be so in memory, and we certainly wouldn't know one way or the other at planning time. But I agree we should add a cached_page_cost as part of all this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap sok...@gmail.com wrote: Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. Maybe, but what is *best* for you is to tune PostgreSQL so that your costs are accurately modeled, at which point it will automatically pick the best plan for most or all of your queries without you needing to worry about it. If you set your effective_cache_size to the sum of shared_buffers and what your OS reports as cache after you've been running a while, that will help the optimizer know what size index fits in RAM, and will tend to encourage index use. If the active portion of your data is heavily cached, you might want to set random_page_cost and seq_page_cost to the same value, and make that value somewhere in the 0.1 to 0.05 range. If you have moderate caching, using 1 and 2 can be good. If you're still not getting reasonable plans, please post again with more information about your hardware along with the query and its EXPLAIN ANALYZE output. -Kevin I understand the need to tune PostgreSQL properly for my use case. What I am curious about is, for the data set I have, under what circumstances (hardware/workload/cache status/etc) would a sequential scan really be faster than an index scan for that particular query? To simulate a scenario when nothing is cached, I stopped PostgreSQL, dropped all system cache (sync; echo 3 /proc/sys/vm/drop_caches), restarted PostgreSQL, and ran the query. A sequential scan run took 13.70 seconds, while an index scan run took 0.34 seconds, which is still 40 times faster. Also, I tried increasing effective_cache_size from 512MB to 3GB (the database size is 2+GB), and it still favor sequential scan. The estimated costs did not change at all. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap wrote: Kevin Grittner wrote: Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). Here's the configuration (this is just a low end laptop): version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2, 64-bit checkpoint_segments | 16 default_statistics_target | 1 Usually overkill. If this didn't help, you should probably change it back. effective_cache_size | 512MB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_destination | syslog log_min_duration_statement | 0 maintenance_work_mem | 256MB max_connections | 100 You probably don't need this many connections. max_stack_depth | 2MB port | 5432 random_page_cost | 4 server_encoding | UTF8 shared_buffers | 256MB silent_mode | on TimeZone | Asia/Kuala_Lumpur wal_buffers | 1MB work_mem | 32MB (20 rows) It's hard to recommend other changes without knowing the RAM on the system. How many of what kind of CPUs would help, too. The thing is, the query I posted was fairly simple (I think), and PostgreSQL should be able to choose the 3000+ times faster index scan with the default random_page_cost of 4. It picks the plan with the lowest estimated cost. If it's not picking the best plan, that's usually an indication that you need to adjust cost factors so that estimates better model the actual costs. If I need to reduce it to 2 when using a 5.4k rpm slow disk, what is random_page_cost = 4 good for? It's good for large databases with a lot of physical disk I/O. In fact, in some of those cases, it needs to be higher. In your test, the numbers indicate that everything was cached in RAM. That makes the effective cost very low. Also, the odds are that you have more total cache space between the shared_buffers and the OS cache than the effective_cache_size setting, so the optimizer doesn't expect the number of cache hits you're getting on index usage. -Kevin Thanks for the tips and explanation. I wrongly assumed the random_page_cost value is independent from caching. Now, let's go back to the original query: SELECT salutations.id, salutations.name, EXISTS ( SELECT 1 FROM contacts WHERE salutations.id = contacts.salutation_id ) AS in_use FROM salutations If I split up the query, i.e. running this once: SELECT salutations.id, salutations.name FROM salutations and then running this 44 times, once for each row: SELECT EXISTS ( SELECT 1 FROM contacts WHERE contacts.salutation_id = ? ) AS in_use I can see that PostgreSQL will smartly pick the best plan, i.e. for common salutations (Madam, Ms, etc), it will do sequential scan, while for salutations that are rarely used or not used at all, it will do index scan. Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap sok...@gmail.com wrote: I understand the need to tune PostgreSQL properly for my use case. What I am curious about is, for the data set I have, under what circumstances (hardware/workload/cache status/etc) would a sequential scan really be faster than an index scan for that particular query? The sequential scan on contacts can be terminated as soon as the first matching row is found. If each block of the contacts table contains one example of each salutation, then the inner sequential scan will always be very short, and faster than an index scan. I can engineer this to be the case by populating the table like this: insert into contacts select (generate_series%44+1)::int from generate_series (1,100); Here I get the seq scan being 2.6ms while the index scan is 5.6ms. Predicting how far the inner scan needs to go would be quite difficult, and I don't know how the system will do it. However, when I create and populate simple tables based on your description, I get the index scan being the lower estimated cost. So the tables I built are not sufficient to study the matter in detail. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Sok Ann Yap wrote: Kevin Grittner wrote: Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). Here's the configuration (this is just a low end laptop): version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2, 64-bit checkpoint_segments | 16 default_statistics_target | 1 Usually overkill. If this didn't help, you should probably change it back. effective_cache_size | 512MB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_destination | syslog log_min_duration_statement | 0 maintenance_work_mem | 256MB max_connections | 100 You probably don't need this many connections. max_stack_depth | 2MB port | 5432 random_page_cost | 4 server_encoding | UTF8 shared_buffers | 256MB silent_mode | on TimeZone | Asia/Kuala_Lumpur wal_buffers | 1MB work_mem | 32MB (20 rows) It's hard to recommend other changes without knowing the RAM on the system. How many of what kind of CPUs would help, too. The thing is, the query I posted was fairly simple (I think), and PostgreSQL should be able to choose the 3000+ times faster index scan with the default random_page_cost of 4. It picks the plan with the lowest estimated cost. If it's not picking the best plan, that's usually an indication that you need to adjust cost factors so that estimates better model the actual costs. If I need to reduce it to 2 when using a 5.4k rpm slow disk, what is random_page_cost = 4 good for? It's good for large databases with a lot of physical disk I/O. In fact, in some of those cases, it needs to be higher. In your test, the numbers indicate that everything was cached in RAM. That makes the effective cost very low. Also, the odds are that you have more total cache space between the shared_buffers and the OS cache than the effective_cache_size setting, so the optimizer doesn't expect the number of cache hits you're getting on index usage. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Sok Ann Yap sok...@gmail.com wrote: Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. Maybe, but what is *best* for you is to tune PostgreSQL so that your costs are accurately modeled, at which point it will automatically pick the best plan for most or all of your queries without you needing to worry about it. If you set your effective_cache_size to the sum of shared_buffers and what your OS reports as cache after you've been running a while, that will help the optimizer know what size index fits in RAM, and will tend to encourage index use. If the active portion of your data is heavily cached, you might want to set random_page_cost and seq_page_cost to the same value, and make that value somewhere in the 0.1 to 0.05 range. If you have moderate caching, using 1 and 2 can be good. If you're still not getting reasonable plans, please post again with more information about your hardware along with the query and its EXPLAIN ANALYZE output. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Hi, I am using PostgreSQL 9.0. There is a salutations table with 44 rows, and a contacts table with more than a million rows. The contacts table has a nullable (only 0.002% null) salutation_id column, referencing salutations.id. With this query: SELECT salutations.id, salutations.name, salutations.description, EXISTS ( SELECT 1 FROM contacts WHERE salutations.id = contacts.salutation_id ) AS in_use FROM salutations I have to reduce random_page_cost from 4 to 2 to force index scan. EXPLAIN ANALYSE output with random_page_cost = 4: Seq Scan on salutations (cost=0.00..50.51 rows=44 width=229) (actual time=0.188..3844.037 rows=44 loops=1) SubPlan 1 - Seq Scan on contacts (cost=0.00..64578.41 rows=57906 width=0) (actual time=87.358..87.358 rows=1 loops=44) Filter: ($0 = salutation_id) Total runtime: 3844.113 ms EXPLAIN ANALYSE output with random_page_cost = 4, enable_seqscan = 0: Seq Scan on salutations (cost=100.00..195.42 rows=44 width=229) (actual time=0.053..0.542 rows=44 loops=1) SubPlan 1 - Index Scan using ix_contacts_salutation_id on contacts (cost=0.00..123682.07 rows=57906 width=0) (actual time=0.011..0.011 rows=1 loops=44) Index Cond: ($0 = salutation_id) Total runtime: 0.592 ms EXPLAIN ANALYSE output with random_page_cost = 2: Seq Scan on salutations (cost=0.00..48.87 rows=44 width=229) (actual time=0.053..0.541 rows=44 loops=1) SubPlan 1 - Index Scan using ix_contacts_salutation_id on contacts (cost=0.00..62423.45 rows=57906 width=0) (actual time=0.011..0.011 rows=1 loops=44) Index Cond: ($0 = salutation_id) Total runtime: 0.594 ms So, index scan wins by a very small margin over sequential scan after the tuning. I am a bit puzzled because index scan is more than 3000 times faster in this case, but the estimated costs are about the same. Did I do something wrong? Regards, Yap -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Sok Ann Yap sok...@gmail.com wrote: So, index scan wins by a very small margin over sequential scan after the tuning. I am a bit puzzled because index scan is more than 3000 times faster in this case, but the estimated costs are about the same. Did I do something wrong? Tuning is generally needed to get best performance from PostgreSQL. Needing to reduce random_page_cost is not unusual in situations where a good portion of the active data is in cache (between shared_buffers and the OS cache). Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). The configuration part can be obtained by running the query on this page and pasting the result into your next post: http://wiki.postgresql.org/wiki/Server_Configuration There are probably some other configuration adjustments you could do to ensure that good plans are chosen. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance