On Fri, Mar 23, 2012 at 1:51 PM, Greg Stark <st...@mit.edu> wrote: > Well it's not entirely unlikely. If you step back a web application > looks like a big loop with a switch statement to go to different > pages. It keeps executing the same loop over and over again and there > are only a smallish number of web pages. Sure the bind variables > change but there will only be so many bind values and 10% of those > will get 90% of the traffic too.
That may be true, but lots of web applications have millions of users. The fact that a few hundred thousand of those may account for most of the traffic doesn't seem like it's going to help much unless there are not many users in total; and in that case it's plenty fast enough without a cache anyway. > But the other thing that happens is that people run multiple queries > aggregating or selecting from the same subset of data. So you often > get things like > > select count(*) from (<complex subquery>) > select * from (<complex subquery>) order by foo limit 10 > select * from (<complex subquery>) order by bar limit 10 > > for the same <complex subquery>. That means if we could cache the rows > coming out of parts of the plan and remember those rows when we see a > plan with a common subtree in the plan then we could avoid a lot of > repetitive work. Currently, we don't even recognize this situation within a plan; for example, if you do project pp LEFT JOIN person sr ON pp.sales_rep_id = sr.id LEFT JOIN person pm ON pp.project_manager_id = pm.id, the query planner will happily seq-scan the person table twice to build two copies of the same hash table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers