On 02/26/2010 02:57 PM, Tom Lane wrote:
Mark Mielke<m...@mark.mielke.cc>  writes:
There must be some way to lift the cost of planning out of the plan
enumeration and selection phase, such that only plan enumeration and
selection is run at execute time. In most cases, plan enumeration and
selection, provided that all data required to make these decisions is
all cached in data structures ready to go, should be very fast? Right?
Huh?  What exactly do you think the cost of planning is, if not
enumeration and selection?  There isn't very much that's cacheable,
at least not in any sanely-sized cache.

I think most operations, including this one, can be broken into a fixed portion and a dynamic portion. The PREPARE should concern itself only with the fixed portion, and should leave the dynamic portion to EXECUTE. At present, the "planning process" is one big blob.

Here are parts that can be done "fixed":

1) Statement parsing and error checking.
2) Identification of tables and columns involved in the query.
3) Query the column statistics for involved columns, to be used in plan cost estimation now and later. 4) Determine plan constraints under which elements of the plan must be executed a certain way (something like constant folding for a compiler), or for which parameter substitution would not impact the outcome. 5) Identify the elements of the plan that still require plan enumeration and plan selection, to be used in a later part of the pipeline.

At a minimum, I am suggesting that 1), 2), and 3) should take a chunk out of the planning process. I think 4) and 5) are more complex but still valuable in terms of extracting the fixed portion out of the planning process.

I think an assumption is being made that the planning process is an atomic unit that cannot be turned into a pipeline or assembly line. I think this assumption was what originally tied PREPARE = PLAN, and EXECUTE = RUN. I think this assumption is leading to the conclusion that EXECUTE should re-plan. I also expect that this assumption is tightly woven into the current implementation and changing it would require some amount of re-architecture. :-)

By "not worth it", do you mean development effort or run time?
Run time.  The development cost of what you are proposing is negligible:
just rip out the plan cache altogether.  I don't believe it would be a
performance win though.

That's not my proposal, though. I'm suspecting you didn't read it. :-)

I'm fine with you saying "too hard and not worth my development effort" after you read it. I agree it would be a lot of work.

But if the conclusion is that the current architecture is the best that can be had, and the decision is only about when to do a custom re-plan or when to use the generic plan, I am putting my opinion out there that the generic plan has always been a compromise, and it will always be a compromise, and that this discussion exists primarily because the compromise is not adequate in many real world scenarios.

And that all said, I think I am challenging the status quo and ticking people off. So while my intent is to challenge the status quo, it is not to tick people off. So, please let me know if you would like me to continue, or if you have already written this off. :-)

Cheers,
mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to