On Thu, Aug 23, 2018 at 11:10 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Rebased up to HEAD, per cfbot nagging. Still no substantive change from > v2.
I happened to have the opportunity to talk to Tom about this patch in person. I expressed some very general concerns that are worth repeating publicly. This patch adds an enhancement that is an example of a broader class of optimizer enhancement primarily aimed at making star-schema queries have more efficient plans, by arranging to use several independent nested loop joins based on a common pattern. Each nestloop join has one particular dimension table on the outer side, and the fact table on the inner side. The query plan is not so much a tree as it is a DAG (directed acyclic graph), because the fact table is visited multiple times. (There are already cases in Postgres in which the query plan is technically a DAG, actually, but it could be taken much further.) Aside from being inherently more efficient, DAG-like star schema plans are also *ideal* targets for parallel query. The executor can execute each nested loop join in a parallel worker with minimal contention -- the inner side of each nestloop join all probe a different fact table index to the others. It's almost like executing several different simple queries concurrently, with some serial processing at the end. Even that serial processing can sometimes be minimized by having some of the parallel workers use a Bloom filter in shared memory. Tom is already concerned that the optimization added by this patch may be too much of a special case, which is understandable. It may be that we're failing to identify some greater opportunity to add DAG-like plans for star schema queries. -- Peter Geoghegan