Re: [HACKERS] [WIP] cache estimates, cache access cost
Greg Smith g...@2ndquadrant.com wrote: On 06/19/2011 06:15 PM, Kevin Grittner wrote: I think the point is that if, on a fresh system, the first access to a table is something which uses a tables scan -- like select count(*) -- that all indexed access would then tend to be suppressed for that table. After all, for each individual query, selfishly looking at its own needs in isolation, it likely *would* be faster to use the cached heap data. If those accesses can compete with other activity, such that the data really does stay in the cache rather than being evicted, then what's wrong with that? The problem is that if somehow the index *does* find its way into cache, the queries might all run an order of magnitude faster by using it. The *first* query to bite the bullet and read through the index wouldn't, of course, since it would have all that random disk access. But its not hard to imagine an application mix where this feature could cause a surprising ten-fold performance drop after someone does a table scan which could persist indefinitely. I'm not risking that in production without a clear mechanism to automatically recover from that sort of cache skew. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Kevin Grittner wrote: But its not hard to imagine an application mix where this feature could cause a surprising ten-fold performance drop after someone does a table scan which could persist indefinitely. I'm not risking that in production without a clear mechanism to automatically recover from that sort of cache skew The idea that any of this will run automatically is a dream at this point, so saying you want to automatically recover from problems with the mechanism that doesn't even exist yet is a bit premature. Some of the implementation ideas here might eventually lead to where real-time cache information is used, and that is where the really scary feedback loops you are right to be worried about come into play. The idea for now is that you'll run this new type of ANALYZE CACHE operation manually, supervised and at a time where recent activity reflects the sort of workload you want to optimize for. And then you should review its results to make sure the conclusions it drew about your cache population aren't really strange. To help with that, I was thinking of writing a sanity check tool that showed how the cached percentages this discovers compare against the historical block hit percentages for the relation. An example of how values changed from what they were already set to after a second ANALYZE CACHE is probably useful too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Greg Smith g...@2ndquadrant.com wrote: The idea that any of this will run automatically is a dream at this point, so saying you want to automatically recover from problems with the mechanism that doesn't even exist yet is a bit premature. Well, I certainly didn't mean it to be a reason not to move forward with development -- I wouldn't have raised the issue had you not said this upthread: I don't see how sequential scan vs. index costing will be any different on a fresh system than it is now. All I was saying is: I do; here's how... Carry on. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Tue, Jun 14, 2011 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. I feel like this is all baseless FUD. ANALYZE isn't perfect but it's our interface for telling postgres to gather stats and we generally agree that having stats and modelling the system behaviour as accurately as practical is the right direction so we need a specific reason why this stat and this bit of modeling is a bad idea before we dismiss it. I think the kernel of truth in these concerns is simply that everything else ANALYZE looks at mutates only on DML. If you load the same data into two databases and run ANALYZE you'll get (modulo random sampling) the same stats. And if you never modify it and analyze it again a week later you'll get the same stats again. So autovacuum can guess when to run analyze based on the number of DML operations, it can run it without regard to how busy the system is, and it can hold off on running it if the data hasn't changed. In the case of the filesystem buffer cache the cached percentage will vary over time regardless of whether the data changes. Plain select queries will change it, even other activity outside the database will change it. There are a bunch of strategies for mitigating this problem: we might want to look at the cache situation more frequently, discount the results we see since more aggressively, and possibly maintain a kind of running average over time. There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... causing it to never load that table into cache... causing that table to never be cached and never switch to an index method. It's possible there are mitigation strategies for this as well such as keeping a running average over time and discounting the estimates with some heuristic values. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
2011/6/19 Greg Stark st...@mit.edu: On Tue, Jun 14, 2011 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. I feel like this is all baseless FUD. ANALYZE isn't perfect but it's our interface for telling postgres to gather stats and we generally agree that having stats and modelling the system behaviour as accurately as practical is the right direction so we need a specific reason why this stat and this bit of modeling is a bad idea before we dismiss it. I think the kernel of truth in these concerns is simply that everything else ANALYZE looks at mutates only on DML. If you load the same data into two databases and run ANALYZE you'll get (modulo random sampling) the same stats. And if you never modify it and analyze it again a week later you'll get the same stats again. So autovacuum can guess when to run analyze based on the number of DML operations, it can run it without regard to how busy the system is, and it can hold off on running it if the data hasn't changed. In the case of the filesystem buffer cache the cached percentage will vary over time regardless of whether the data changes. Plain select queries will change it, even other activity outside the database will change it. There are a bunch of strategies for mitigating this problem: we might want to look at the cache situation more frequently, discount the results we see since more aggressively, and possibly maintain a kind of running average over time. Yes. There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... causing it to never load that table into cache... causing that table to never be cached and never switch to an index method. It's possible there are mitigation strategies for this as well Yeah, that's one of the problem to solve. So far I've tried to keep a planner which behave as currently when the rel_oscache == 0. So that fresh server will have the same planning than a server without rel_oscache. Those points are to be solved in costestimates (and selfunc). For this case, there is a balance between page filtering cost and index access cost. *And* once the table is in cache, the index cost less and can be better because it need less filtering (less rows, less pages, less work). there is also a possible issue here (if using the index remove the table from cache) but I am not too much afraid of that right now. such as keeping a running average over time and discounting the estimates with some heuristic values. yes, definitively something to think about. My biggest fear here is for shared servers (when there is competition between services to use the OS cache, shooting down kernel cache strategies). -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Sun, Jun 19, 2011 at 9:38 AM, Greg Stark st...@mit.edu wrote: On Tue, Jun 14, 2011 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. I feel like this is all baseless FUD. ANALYZE isn't perfect but it's our interface for telling postgres to gather stats and we generally agree that having stats and modelling the system behaviour as accurately as practical is the right direction so we need a specific reason why this stat and this bit of modeling is a bad idea before we dismiss it. I think the kernel of truth in these concerns is simply that everything else ANALYZE looks at mutates only on DML. If you load the same data into two databases and run ANALYZE you'll get (modulo random sampling) the same stats. And if you never modify it and analyze it again a week later you'll get the same stats again. So autovacuum can guess when to run analyze based on the number of DML operations, it can run it without regard to how busy the system is, and it can hold off on running it if the data hasn't changed. In the case of the filesystem buffer cache the cached percentage will vary over time regardless of whether the data changes. Plain select queries will change it, even other activity outside the database will change it. There are a bunch of strategies for mitigating this problem: we might want to look at the cache situation more frequently, discount the results we see since more aggressively, and possibly maintain a kind of running average over time. There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... causing it to never load that table into cache... causing that table to never be cached and never switch to an index method. It's possible there are mitigation strategies for this as well such as keeping a running average over time and discounting the estimates with some heuristic values. *scratches head* Well, yeah. I completely agree with you that these are the things we need to worry about. Maybe I did a bad job explaining myself, because ISTM you said my concerns were FUD and then went on to restate them in different words. I'm not bent out of shape about using ANALYZE to try to gather the information. That's probably a reasonable approach if it turns out we actually need to do it at all. I am not sure we do. What I've argued for in the past is that we start by estimating the percentage of the relation that will be cached based on its size relative to effective_cache_size, and allow the administrator to override the percentage on a per-relation basis if it turns out to be wrong. That would avoid all of these concerns and allow us to focus on the issue of how the caching percentages impact the choice of plan, and whether the plans that pop out are in fact better when you provide information on caching as input. If we have that facility in core, then people can write scripts or plug-in modules to do ALTER TABLE .. SET (caching_percentage = XYZ) every hour or so based on the sorts of statistics that Cedric is gathering here, and users will be able to experiment with a variety of algorithms and determine which ones work the best. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
2011/6/19 Robert Haas robertmh...@gmail.com: On Sun, Jun 19, 2011 at 9:38 AM, Greg Stark st...@mit.edu wrote: On Tue, Jun 14, 2011 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. I feel like this is all baseless FUD. ANALYZE isn't perfect but it's our interface for telling postgres to gather stats and we generally agree that having stats and modelling the system behaviour as accurately as practical is the right direction so we need a specific reason why this stat and this bit of modeling is a bad idea before we dismiss it. I think the kernel of truth in these concerns is simply that everything else ANALYZE looks at mutates only on DML. If you load the same data into two databases and run ANALYZE you'll get (modulo random sampling) the same stats. And if you never modify it and analyze it again a week later you'll get the same stats again. So autovacuum can guess when to run analyze based on the number of DML operations, it can run it without regard to how busy the system is, and it can hold off on running it if the data hasn't changed. In the case of the filesystem buffer cache the cached percentage will vary over time regardless of whether the data changes. Plain select queries will change it, even other activity outside the database will change it. There are a bunch of strategies for mitigating this problem: we might want to look at the cache situation more frequently, discount the results we see since more aggressively, and possibly maintain a kind of running average over time. There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... causing it to never load that table into cache... causing that table to never be cached and never switch to an index method. It's possible there are mitigation strategies for this as well such as keeping a running average over time and discounting the estimates with some heuristic values. *scratches head* Well, yeah. I completely agree with you that these are the things we need to worry about. Maybe I did a bad job explaining myself, because ISTM you said my concerns were FUD and then went on to restate them in different words. I'm not bent out of shape about using ANALYZE to try to gather the information. That's probably a reasonable approach if it turns out we actually need to do it at all. I am not sure we do. What I've argued for in the past is that we start by estimating the percentage of the relation that will be cached based on its size relative to effective_cache_size, and allow the administrator to override the percentage on a per-relation basis if it turns out to be wrong. That would avoid all of these concerns and allow us to focus on the issue of how the caching percentages impact the choice of plan, and whether the plans that pop out are in fact better when you provide information on caching as input. If we have that facility in core, then people can write scripts or plug-in modules to do ALTER TABLE .. SET (caching_percentage = XYZ) every hour or so based on the sorts of statistics that Cedric is gathering here, and users will be able to experiment with a variety of algorithms and determine which ones work the best. Robert, I am very surprised. My patch does offer that. 1st, I used ANALYZE because it is the way to update pg_class I found. You are suggesting ALTER TABLE instead, that is fine, but give me that lock-free :) else we have the ahem.. Alvaro's pg_class_ng (I find this one interesting because it will be lot easier to have different values on standby server if we find a way to have pg_class_ng 'updatable' per server) So, as long as the value can be change without problem, I don't care where it resides. 2nd, I provided the patches on the last CF, exactly to allow to go to the exciting part: the cost-estimates changes. (after all, we can work on the cost estimate, and if later we find a way to use ALTER TABLE or pg_class_ng, just do it instead of via the ANALYZE magic) 3nd, you can right now write a plugin to set the value of rel_oscache (exactly like the one you'll do for a ALTER TABLE SET reloscache...) RelationGetRelationOSCacheInFork(Relation relation, ForkNumber forkNum) { float4 percent = 0; /* if a plugin is present, let it manage things */ if (OSCache_hook)
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/19/2011 09:38 AM, Greg Stark wrote: There's another problem which I haven't seen mentioned. Because the access method will affect the cache there's the possibility of feedback loops. e.g. A freshly loaded system prefers sequential scans for a given table because without the cache the seeks of random reads are too expensive... Not sure if it's been mentioned in this thread yet, but he feedback issue has popped up in regards to this area plenty of times. I think everyone who's producing regular input into this is aware of it, even if it's not mentioned regularly. I'm not too concerned about the specific case you warned about because I don't see how sequential scan vs. index costing will be any different on a fresh system than it is now. But there are plenty of cases like it to be mapped out here, and many are not solvable--they're just something that needs to be documented as a risk. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Greg Smith wrote: I'm not too concerned about the specific case you warned about because I don't see how sequential scan vs. index costing will be any different on a fresh system than it is now. I think the point is that if, on a fresh system, the first access to a table is something which uses a tables scan -- like select count(*) -- that all indexed access would then tend to be suppressed for that table. After all, for each individual query, selfishly looking at its own needs in isolation, it likely *would* be faster to use the cached heap data. I see two ways out of that -- one hard and one easy. One way would be to somehow look at the impact on the cache of potential plans and the resulting impact on overall throughput of the queries being run with various cache contents. That's the hard one, in case anyone wasn't clear. ;-) The other way would be to run some percentage of the queries *without* considering current cache contents, so that the cache can eventually adapt to the demands. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/19/2011 06:15 PM, Kevin Grittner wrote: I think the point is that if, on a fresh system, the first access to a table is something which uses a tables scan -- like select count(*) -- that all indexed access would then tend to be suppressed for that table. After all, for each individual query, selfishly looking at its own needs in isolation, it likely *would* be faster to use the cached heap data. If those accesses can compete with other activity, such that the data really does stay in the cache rather than being evicted, then what's wrong with that? We regularly have people stop by asking for how to pin particular relations to the cache, to support exactly this sort of scenario. What I was would expect on any mixed workload is that the table would slowly get holes shot in it, as individual sections were evicted for more popular index data. And eventually there'd be little enough left for it to win over an index scan. But if people keep using the copy of the table in memory instead, enough so that it never really falls out of cache, well that's not necessarily even a problem--it could be considered a solution for some. The possibility that people can fit their entire table into RAM and it never leaves there is turning downright probable in some use cases now. A good example are cloud instances using EC2, where people often architect their systems such that the data set put onto any one node fits into RAM. As soon as that's not true you suffer too much from disk issues, so breaking the databases into RAM sized pieces turns out to be very good practice. It's possible to tune fairly well for this case right now--just make the page costs all low. The harder case that I see a lot is where all the hot data fits into cache, but there's a table or two of history/archives that don't. And that would be easier to do the right thing with given this bit of what's in the cache? percentages. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Sun, Jun 19, 2011 at 3:32 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2nd, I provided the patches on the last CF, exactly to allow to go to the exciting part: the cost-estimates changes. (after all, we can work on the cost estimate, and if later we find a way to use ALTER TABLE or pg_class_ng, just do it instead of via the ANALYZE magic) We're talking past each other here, somehow. The cost-estimating part does not require this patch in order to something useful, but this patch, AFAICT, absolutely does require the cost-estimating part to do something useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] cache estimates, cache access cost
2011/5/16 Greg Smith g...@2ndquadrant.com: Cédric Villemain wrote: http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache This rebases easily to make Cedric's changes move to the end; I just pushed a version with that change to https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone wants a cleaner one to browse. I've attached a patch too if that's more your thing. I'd recommend not getting too stuck on the particular hook Cédric has added here to compute the cache estimate, which uses mmap and mincore to figure it out. It's possible to compute similar numbers, albeit less accurate, using an approach similar to how pg_buffercache inspects things. And I even once wrote a background writer extension that collected this sort of data as it was running the LRU scan anyway. Discussions of this idea seem to focus on how the what's in the cache? data is collected, which as far as I'm concerned is the least important part. There are multiple options, some work better than others, and there's no reason that can't be swapped out later. The more important question is how to store the data collected and then use it for optimizing queries. Attached are updated patches without the plugin itself. I've also added the cache_page_cost GUC, this one is not per tablespace, like others page_cost. There are 6 patches: 0001-Add-reloscache-column-to-pg_class.patch 0002-Add-a-function-to-update-the-new-pg_class-cols.patch 0003-Add-ANALYZE-OSCACHE-VERBOSE-relation.patch 0004-Add-a-Hook-to-handle-OSCache-stats.patch 0005-Add-reloscache-to-Index-Rel-OptInfo.patch 0006-Add-cache_page_cost-GUC.patch I have some comments on my own code: * I am not sure of the best datatype to use for 'reloscache' * I didn't include the catalog number change in the patch itself. * oscache_update_relstats() is very similar to vac_update_relstats(), maybe better to merge them but reloscache should not be updated at the same time than other stats. * There is probably too much work done in do_oscache_analyze_rel() because I kept vac_open_indexes() (not a big drama atm) * I don't know so much how gram.y works, so I am not sure my changes cover all cases. * No tests; similar columns and GUC does not have test either, but it lacks a test for ANALYZE OSCACHE -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support From d2fe7e85aea31cfe8cd6559a060f71c424fe03af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Villemain?= ced...@2ndquadrant.fr Date: Wed, 25 May 2011 23:17:36 +0200 Subject: [PATCH 1/7] Add reloscache column to pg_class 1 column reloscache to contain the percentage of pages in cache per relation. May be used by the planner and updated with ANALYZE OSCACHE; (not done yet, see next commits) --- doc/src/sgml/catalogs.sgml | 11 + src/backend/catalog/heap.c |1 + src/backend/utils/cache/relcache.c |2 + src/include/catalog/pg_class.h | 44 ++- 4 files changed, 37 insertions(+), 21 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml new file mode 100644 index 8504555..4cfad39 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 1634,1639 --- 1634,1650 /row row + entrystructfieldreloscache/structfield/entry + entrytypefloat4/type/entry + entry/entry + entry +Percentage of the files in OS cache. This is only an estimate used by +the planner. It is updated by commandANALYZE OSCACHE/command. +By default, the value is not updated and an extension is required. + /entry + /row + + row entrystructfieldreltoastrelid/structfield/entry entrytypeoid/type/entry entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c new file mode 100644 index a6e541d..2043c40 *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *** InsertPgClassTuple(Relation pg_class_des *** 764,769 --- 764,770 values[Anum_pg_class_reltablespace - 1] = ObjectIdGetDatum(rd_rel-reltablespace); values[Anum_pg_class_relpages - 1] = Int32GetDatum(rd_rel-relpages); values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel-reltuples); + values[Anum_pg_class_reloscache - 1] = Float4GetDatum(rd_rel-reloscache); values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel-reltoastrelid); values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel-reltoastidxid); values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel-relhasindex); diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c new file mode 100644 index d7e94ff..ca09e3b *** a/src/backend/utils/cache/relcache.c ---
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Tue, Jun 14, 2011 at 10:29 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 0001-Add-reloscache-column-to-pg_class.patch 0002-Add-a-function-to-update-the-new-pg_class-cols.patch 0003-Add-ANALYZE-OSCACHE-VERBOSE-relation.patch 0004-Add-a-Hook-to-handle-OSCache-stats.patch 0005-Add-reloscache-to-Index-Rel-OptInfo.patch 0006-Add-cache_page_cost-GUC.patch It seems to me that posting updated versions of this patch gets us no closer to addressing the concerns I (and Tom, on other threads) expressed about this idea previously. Specifically: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. Now, it's possible that you or someone else could provide some experimental evidence refuting these points. But right now there isn't any, and until there is, -1 from me on applying any of this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
2011/6/14 Robert Haas robertmh...@gmail.com: On Tue, Jun 14, 2011 at 10:29 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 0001-Add-reloscache-column-to-pg_class.patch 0002-Add-a-function-to-update-the-new-pg_class-cols.patch 0003-Add-ANALYZE-OSCACHE-VERBOSE-relation.patch 0004-Add-a-Hook-to-handle-OSCache-stats.patch 0005-Add-reloscache-to-Index-Rel-OptInfo.patch 0006-Add-cache_page_cost-GUC.patch It seems to me that posting updated versions of this patch gets us no closer to addressing the concerns I (and Tom, on other threads) expressed about this idea previously. Specifically: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. ANALYZE happens when people execute it, else it is auto-analyze and I am not providing auto-analyze-oscache. ANALYZE OSCACHE is just a very simple wrapper to update pg_class. The frequency is not important here, I believe. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. Again, it is hard to do a UPDATE pg_class SET reloscache, so I used ANALYZE logic. Also I have taken into account the fact that someone may want to SET the values like it was also suggested, so my patches allow to do : 'this table is 95% in cache, the DBA said' (it is stable, not based on OS stats). This case has been suggested several times and is covered by my patch. 3. Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. I have not finish my work on cost estimation and I believe this work will take some time and can be done in another commitfest. At the moment my patches do not change anything on the dcision of the planner, just offers the tools I need to hack cost estimates. Now, it's possible that you or someone else could provide some experimental evidence refuting these points. But right now there isn't any, and until there is, -1 from me on applying any of this. I was trying to split the patch size by group of features to reduce its size. The work is in progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 11:04 AM, Robert Haas wrote: Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. That's the dependency Cédric has provided us a way to finally make progress on. Everyone says there's no evidence that this whole approach will improve performance. But we can't collect such data, to prove or disprove it helps, without a proof of concept patch that implements *something*. You may not like the particular way the data is collected here, but it's a working implementation that may be useful for some people. I'll take data collected at ANALYZE time as a completely reasonable way to populate the new structures with realistic enough test data to use initially. Surely at least one other way to populate the statistics, and possibly multiple other ways that the user selects, will be needed eventually. I commented a while ago on this thread: every one of these discussions always gets dragged into the details of how the cache statistics data will be collected and rejects whatever is suggested as not good enough. Until that stops, no progress will ever get made on the higher level details. By its nature, developing toward integrating cached percentages is going to lurch forward on both collecting the cache data and using the cache knowledge in queries fronts almost independently. This is not a commit candidate; it's the first useful proof of concept step for something we keep talking about but never really doing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Tue, Jun 14, 2011 at 1:10 PM, Greg Smith g...@2ndquadrant.com wrote: On 06/14/2011 11:04 AM, Robert Haas wrote: Even if the data were accurate and did not cause plan stability, we have no evidence that using it will improve real-world performance. That's the dependency Cédric has provided us a way to finally make progress on. Everyone says there's no evidence that this whole approach will improve performance. But we can't collect such data, to prove or disprove it helps, without a proof of concept patch that implements *something*. You may not like the particular way the data is collected here, but it's a working implementation that may be useful for some people. I'll take data collected at ANALYZE time as a completely reasonable way to populate the new structures with realistic enough test data to use initially. But there's no reason that code (which may or may not eventually prove useful) has to be incorporated into the main tree. We don't commit code so people can go benchmark it; we ask for the benchmarking to be done first, and then if the results are favorable, we commit the code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Tue, Jun 14, 2011 at 12:06 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. ANALYZE happens when people execute it, else it is auto-analyze and I am not providing auto-analyze-oscache. ANALYZE OSCACHE is just a very simple wrapper to update pg_class. The frequency is not important here, I believe. Well, I'm not saying you have to have all the answers to post a WIP patch, certainly. But in terms of getting something committable, it seems like we need to have at least an outline of what the long-term plan is. If ANALYZE OSCACHE is an infrequent operation, then the data isn't going to be a reliable guide to what will happen at execution time... 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. ...and if it is a frequent operation then it's going to result in unstable plans (and maybe pg_class bloat). There's a fundamental tension here that I don't think you can just wave your hands at. I was trying to split the patch size by group of features to reduce its size. The work is in progress. Totally reasonable, but I can't see committing any of it without some evidence that there's light at the end of the tunnel. No performance tests *whatsoever* have been done. We can debate the exact amount of evidence that should be required to prove that something is useful from a performance perspective, but we at least need some. I'm beating on this point because I believe that the whole idea of trying to feed this information back into the planner is going to turn out to be something that we don't want to do. I think it's going to turn out to have downsides that are far larger than the upsides. I am completely willing to be be proven wrong, but right now I think this will make things worse and you think it will make things better and I don't see any way to bridge that gap without doing some measurements. For example, if you run this patch on a system and subject that system to a relatively even workload, how much do the numbers bounce around between runs? What if you vary the workload, so that you blast it with OLTP traffic at some times and then run reporting queries at other times? Or different tables become hot at different times? Once you've written code to make the planner do something with the caching % values, then you can start to explore other questions. Can you generate plan instability, especially on complex queries, which are more prone to change quickly based on small changes in the cost estimates? Can you demonstrate a workload where bad performance is inevitable with the current code, but with your code, the system becomes self-tuning and ends up with good performance? What happens if you have a large cold table with a small hot end where all activity is concentrated? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
2011/6/14 Robert Haas robertmh...@gmail.com: On Tue, Jun 14, 2011 at 12:06 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 1. ANALYZE happens far too infrequently to believe that any data taken at ANALYZE time will still be relevant at execution time. ANALYZE happens when people execute it, else it is auto-analyze and I am not providing auto-analyze-oscache. ANALYZE OSCACHE is just a very simple wrapper to update pg_class. The frequency is not important here, I believe. Well, I'm not saying you have to have all the answers to post a WIP patch, certainly. But in terms of getting something committable, it seems like we need to have at least an outline of what the long-term plan is. If ANALYZE OSCACHE is an infrequent operation, then the data isn't going to be a reliable guide to what will happen at execution time... Ok. 2. Using data gathered by ANALYZE will make plans less stable, and our users complain not infrequently about the plan instability we already have, therefore we should not add more. ...and if it is a frequent operation then it's going to result in unstable plans (and maybe pg_class bloat). There's a fundamental tension here that I don't think you can just wave your hands at. I don't want to hide that point, which is just correct. The idea is not to have something (which need to be) updated too much but it needs to be taken into account. I was trying to split the patch size by group of features to reduce its size. The work is in progress. Totally reasonable, but I can't see committing any of it without some evidence that there's light at the end of the tunnel. No performance tests *whatsoever* have been done. We can debate the exact amount of evidence that should be required to prove that something is useful from a performance perspective, but we at least need some. I'm beating on this point because I believe that the whole idea of trying to feed this information back into the planner is going to turn out to be something that we don't want to do. I think it's going to turn out to have downsides that are far larger than the upsides. it is possible, yes. I try to do changes in a way that if the reloscache values is the one by default then the planner keep the same behavior than in the past. I am completely willing to be be proven wrong, but right now I think this will make things worse and you think it will make things better and I don't see any way to bridge that gap without doing some measurements. correct. For example, if you run this patch on a system and subject that system to a relatively even workload, how much do the numbers bounce around between runs? What if you vary the workload, so that you blast it with OLTP traffic at some times and then run reporting queries at other times? Or different tables become hot at different times? This is all true, this is *already* true. Like the thread about random_page_cost vs index_page_cost where the good option is to change the parameters at certain moment in the day (IIRC the use case). I mean that I agree that those benchs need to be done, hopefully I can fix some usecases, while not breaking others too much or not at all, or ... Once you've written code to make the planner do something with the caching % values, then you can start to explore other questions. Can you generate plan instability, especially on complex queries, which are more prone to change quickly based on small changes in the cost estimates? Can you demonstrate a workload where bad performance is inevitable with the current code, but with your code, the system My next step is cost estimation changes. I have already some very small usecases where the minimum changes I did so far are interesting but it is not enought to come with that as evidences. becomes self-tuning and ends up with good performance? What happens if you have a large cold table with a small hot end where all activity is concentrated? We are at step 3 here :-) I have already some ideas to handle those situations but not yet polished. The current idea is to be conservative, like PostgreSQL used to be, for example: /* * disk and cache costs * this assumes an agnostic knowledge of the data repartition and query * usage despite large tables may have a hot part of 10% which is the only * requested part or that we select only (c)old data so the cache useless. * We keep the original strategy to not guess too much and just ponderate * the cost globaly. */ run_cost += baserel-pages * ( spc_seq_page_cost * (1 - baserel-oscache) + cache_page_cost * baserel-oscache); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent
Re: [HACKERS] [WIP] cache estimates, cache access cost
Excerpts from Cédric Villemain's message of mar jun 14 10:29:36 -0400 2011: Attached are updated patches without the plugin itself. I've also added the cache_page_cost GUC, this one is not per tablespace, like others page_cost. There are 6 patches: 0001-Add-reloscache-column-to-pg_class.patch Hmm, do you really need this to be a new column? Would it work to have it be a reloption? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
2011/6/14 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Cédric Villemain's message of mar jun 14 10:29:36 -0400 2011: Attached are updated patches without the plugin itself. I've also added the cache_page_cost GUC, this one is not per tablespace, like others page_cost. There are 6 patches: 0001-Add-reloscache-column-to-pg_class.patch Hmm, do you really need this to be a new column? Would it work to have it be a reloption? If we can have ALTER TABLE running on heavy workload, why not. I am bit scared by the effect of such reloption, it focus on HINT oriented strategy when I would like to allow a dynamic strategy from the server. This work is not done and may not work, so a reloption is good at least as a backup (and is more in the idea suggested by Tom and others) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Excerpts from Cédric Villemain's message of mar jun 14 17:10:20 -0400 2011: If we can have ALTER TABLE running on heavy workload, why not. I am bit scared by the effect of such reloption, it focus on HINT oriented strategy when I would like to allow a dynamic strategy from the server. This work is not done and may not work, so a reloption is good at least as a backup (and is more in the idea suggested by Tom and others) Hmm, sounds like yet another use case for pg_class_nt. Why do these keep popping up? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Cédric Villemain's message of mar jun 14 10:29:36 -0400 2011: 0001-Add-reloscache-column-to-pg_class.patch Hmm, do you really need this to be a new column? Would it work to have it be a reloption? If it's to be updated in the same way as ANALYZE updates reltuples and relpages (ie, an in-place non-transactional update), I think it'll have to be a real column. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 01:16 PM, Robert Haas wrote: But there's no reason that code (which may or may not eventually prove useful) has to be incorporated into the main tree. We don't commit code so people can go benchmark it; we ask for the benchmarking to be done first, and then if the results are favorable, we commit the code. Who said anything about this being a commit candidate? The WIP in the subject says it's not intended to be. The community asks people to submit design ideas early so that ideas around them can be explored publicly. One of the things that needs to be explored, and that could use some community feedback, is exactly how this should be benchmarked in the first place. This topic--planning based on cached percentage--keeps coming up, but hasn't gone very far as an abstract discussion. Having a patch to test lets it turn to a concrete one. Note that I already listed myself as the reviewer here, so it's not even like this is asking explicitly for a community volunteer to help. Would you like us to research this privately and then dump a giant patch that is commit candidate quality on everyone six months from now, without anyone else getting input to the process, or would you like the work to happen here? I recommended Cédric not ever bother soliciting ideas early, because I didn't want to get into this sort of debate. I avoid sending anything here unless I already have a strong idea about the solution, because it's hard to keep criticism at bay even with that. He was more optimistic about working within the community contribution guidelines and decided to send this over early instead. If you feel this is too rough to even discuss, I'll mark it returned with feedback and we'll go develop this ourselves. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Greg Smith wrote: On 06/14/2011 01:16 PM, Robert Haas wrote: But there's no reason that code (which may or may not eventually prove useful) has to be incorporated into the main tree. We don't commit code so people can go benchmark it; we ask for the benchmarking to be done first, and then if the results are favorable, we commit the code. Who said anything about this being a commit candidate? The WIP in the subject says it's not intended to be. The community asks people to submit design ideas early so that ideas around them can be explored publicly. One of the things that needs to be explored, and that could use some community feedback, is exactly how this should be benchmarked in the first place. This topic--planning based on cached percentage--keeps coming up, but hasn't gone very far as an abstract discussion. Having a patch to test lets it turn to a concrete one. Note that I already listed myself as the reviewer here, so it's not even like this is asking explicitly for a community volunteer to help. Would you like us to research this privately and then dump a giant patch that is commit candidate quality on everyone six months from now, without anyone else getting input to the process, or would you like the work to happen here? I recommended C?dric not ever bother soliciting ideas early, because I didn't want to get into this sort of debate. I avoid sending anything here unless I already have a strong idea about the solution, because it's hard to keep criticism at bay even with that. He was more optimistic about working within the community contribution guidelines and decided to send this over early instead. If you feel this is too rough to even discuss, I'll mark it returned with feedback and we'll go develop this ourselves. I would like to see us continue researching in this direction. I think perhaps the background writer would be ideal for collecting this information because it scans the buffer cache already, and frequently. (Yes, I know it can't access databases.) I think random_page_cost is a dead-end --- it will never be possible for it to produce the right value for us. Its value is tied up in caching, e.g. the default 4 is not the right value for a physical drive (it should be much higher), but kernel and shared buffer caching require it to be a hybrid number that isn't really realistic. And once we have caching in that number, it is not going to be even caching for all tables, obviously. Hence, there is no way for random_page_cost to be improved and we have to start thinking about alternatives. Basically, random_page_cost is a terrible setting and we have to admit that and move forward. I realize the concerns about unstable plans, and we might need to give users the option of stable plans with a fixed random_page_cost, but at this point we don't even have enough data to know we need that. What we do know is that random_page_cost is inadequate. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
Greg Smith g...@2ndquadrant.com writes: On 06/14/2011 01:16 PM, Robert Haas wrote: But there's no reason that code (which may or may not eventually prove useful) has to be incorporated into the main tree. We don't commit code so people can go benchmark it; we ask for the benchmarking to be done first, and then if the results are favorable, we commit the code. Who said anything about this being a commit candidate? The WIP in the subject says it's not intended to be. The community asks people to submit design ideas early so that ideas around them can be explored publicly. One of the things that needs to be explored, and that could use some community feedback, is exactly how this should be benchmarked in the first place. This topic--planning based on cached percentage--keeps coming up, but hasn't gone very far as an abstract discussion. Having a patch to test lets it turn to a concrete one. Yeah, it *can't* go very far as an abstract discussion ... we need some realistic testing to decide whether this is a good idea, and you can't get that without code. I think the real underlying issue here is that we have this CommitFest process that is focused on getting committable or nearly-committable code into the tree, and it just doesn't fit well for experimental code. I concur with Robert's desire to not push experimental code into the main repository, but we need to have *some* way of working with it. Maybe a separate repo where experimental branches could hang out would be helpful? (Another way of phrasing my point is that WIP is not conveying the true status of this patch. Maybe Experimental would be an appropriate label.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On 06/14/2011 07:08 PM, Tom Lane wrote: I concur with Robert's desire to not push experimental code into the main repository, but we need to have *some* way of working with it. Maybe a separate repo where experimental branches could hang out would be helpful? Well, this one is sitting around in branches at both git.postgresql.org and github so far, both being updated periodically. Maybe there's some value around an official experimental repository too, but I thought that was the idea of individual people having their own directories on git.postgres.org. Do we need something fancier than that? It would be nice, but seems little return on investment to improve that, relative to what you can do easily enough now. The idea David Fetter has been advocating of having a bit rot farm to help detect when the experimental branches drift too far out of date tries to make that concept really formal. I like that idea, too, but find it hard to marshal enough resources to do something about it. The current status quo isn't that terrible; noticing bit rot when it's relevant isn't that hard to do. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] cache estimates, cache access cost
On Tue, Jun 14, 2011 at 6:17 PM, Greg Smith g...@2ndquadrant.com wrote: Who said anything about this being a commit candidate? The WIP in the subject says it's not intended to be. The community asks people to submit design ideas early so that ideas around them can be explored publicly. One of the things that needs to be explored, and that could use some community feedback, is exactly how this should be benchmarked in the first place. This topic--planning based on cached percentage--keeps coming up, but hasn't gone very far as an abstract discussion. Having a patch to test lets it turn to a concrete one. Note that I already listed myself as the reviewer here, so it's not even like this is asking explicitly for a community volunteer to help. Would you like us to research this privately and then dump a giant patch that is commit candidate quality on everyone six months from now, without anyone else getting input to the process, or would you like the work to happen here? I recommended Cédric not ever bother soliciting ideas early, because I didn't want to get into this sort of debate. I avoid sending anything here unless I already have a strong idea about the solution, because it's hard to keep criticism at bay even with that. He was more optimistic about working within the community contribution guidelines and decided to send this over early instead. If you feel this is too rough to even discuss, I'll mark it returned with feedback and we'll go develop this ourselves. My usual trope on this subject is that WIP patches tend to elicit helpful feedback if and only if the patch author is clear about what sort of feedback they are seeking. I'm interested in this topic, so, I'm willing to put some effort into it; but, as I've said before, I think this patch is coming from the wrong end, so in the absence of any specific guidance on what sort of input would be useful, that's the feedback you're getting. Feel free to clarify what would be more helpful. :-) Incidentally, I have done a bit of math around how to rejigger the costing formulas to take cached_page_cost and caching_percentage into account, which I think is the most interesting end place to start this work. If it's helpful, I can write it up in a more organized way and post that; it likely wouldn't be that much work to incorporate it into what Cedric has here already. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers