On 2018/08/29 21:06, Amit Langote wrote: > I measured the gain in performance due to each patch on a modest virtual > machine. Details of the measurement and results follow. > > UPDATE: > > nparts master 0001 0002 0003 > ====== ====== ==== ==== ==== > 0 2856 2893 2862 2816 > 8 507 1115 1447 1872 > 16 260 765 1173 1892 > 32 119 483 922 1884 > 64 59 282 615 1881 > 128 29 153 378 1835 > 256 14 79 210 1803 > 512 5 40 113 1728 > 1024 2 17 57 1616 > 2048 0* 9 30 1471 > 4096 0+ 4 15 1236 > 8192 0= 2 7 975 > > For SELECT: > > nparts master 0001 0002 0003 > ====== ====== ==== ==== ==== > 0 2290 2329 2319 2268 > 8 1058 1077 1414 1788 > 16 711 729 1124 1789 > 32 450 475 879 1773 > 64 265 272 603 1765 > 128 146 149 371 1685 > 256 76 77 214 1678 > 512 39 39 112 1636 > 1024 16 17 59 1525 > 2048 8 9 29 1416 > 4096 4 4 15 1195 > 8192 2 2 7 932
Prompted by Tsunakawa-san's comment, I tried to look at the profiles when running the benchmark with partitioning and noticed a few things that made clear why, even with 0003 applied, tps numbers decreased as the number of partitions increased. Some functions that appeared high up in the profiles were related to partitioning: * set_relation_partition_info calling partition_bounds_copy(), which calls datumCopy() on N Datums, where N is the number of partitions. The more the number of partitions, higher up it is in profiles. I suspect that this copying might be redundant; planner can keep using the same pointer as relcache There are a few existing and newly introduced sites in the planner where the code iterates over *all* partitions of a table where processing just the partition selected for scanning would suffice. I observed the following functions in profiles: * make_partitionedrel_pruneinfo, which goes over all partitions to generate subplan_map and subpart_map arrays to put into the PartitionedRelPruneInfo data structure that it's in the charge of generating * apply_scanjoin_target_to_paths, which goes over all partitions to adjust their Paths for applying required scanjoin target, although most of those are dummy ones that won't need the adjustment * For UPDATE, a couple of functions I introduced in patch 0001 were doing the same thing as apply_scanjoin_target_to_paths, which is unnecessary To fix the above three instances of redundant processing, I added a Bitmapset 'live_parts' to the RelOptInfo which stores the set of indexes of only the unpruned partitions (into the RelOptInfo.part_rels array) and replaced the for (i = 0; i < rel->nparts; i++) loops in those sites with the loop that iterates over the members of 'live_parts'. Results looked were promising indeed, especially after applying 0003 which gets rid of locking all partitions. UPDATE: nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2856 2893 2862 2816 8 507 1115 1466 1845 16 260 765 1161 1876 32 119 483 910 1862 64 59 282 609 1895 128 29 153 376 1884 256 14 79 212 1874 512 5 40 115 1859 1024 2 17 58 1847 2048 0 9 29 1883 4096 0 4 15 1867 8192 0 2 7 1826 SELECT: nparts master 0001 0002 0003 ====== ====== ==== ==== ==== 0 2290 2329 2319 2268 8 1058 1077 1431 1800 16 711 729 1158 1781 32 450 475 908 1777 64 265 272 612 1791 128 146 149 379 1777 256 76 77 213 1785 512 39 39 114 1776 1024 16 17 59 1756 2048 8 9 30 1746 4096 4 4 15 1722 8192 2 2 7 1706 Note that with 0003, tps doesn't degrade as the number of partitions increase. Attached updated patches, with 0002 containing the changes mentioned above. Thanks, Amit
From 060bd2445ea9cba9adadd73505689d6f06583ee8 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Fri, 24 Aug 2018 12:39:36 +0900 Subject: [PATCH v2 1/3] Overhaul partitioned table update/delete planning Current method, inheritance_planner, applies grouping_planner and hence query_planner to the query repeatedly with each leaf partition replacing the root parent as the query's result relation. One big drawback of this approach is that it cannot use partprune.c to perform partition pruning on the partitioned result relation, because it can only be invoked if query_planner sees the partitioned relation itself in the query. That is not true with the existing method, because as mentioned above, query_planner is invoked with the partitioned relation replaced with individual leaf partitions. While most of the work in each repitition of grouping_planner (and query_planner) is same, a couple of things may differ from partition to partition -- 1. Join planning may produce different Paths for joining against different result partitions, 2. grouping_planner may produce different top-level target lists for different partitions, based on their TupleDescs. This commit rearranges things so that, only the planning steps that affect 1 and 2 above are repeated for partitions that are selected by query_planner by applying partprune.c based pruning to the original partitioned result rel. That makes things faster because 1. partprune.c based pruning is used instead of using constraint exclusion for each partition, 2. grouping_planner (and query_planner) is invoked only once instead of for every partition thus saving cycles and memory. This still doesn't help much if no partitions are pruned, because we still repeat join planning and makes copies of the query for each partition, but for common cases where only handful partitions remain after pruning, this makes things significanly faster. --- doc/src/sgml/ddl.sgml | 15 +- src/backend/optimizer/path/allpaths.c | 97 ++++++- src/backend/optimizer/plan/planmain.c | 4 +- src/backend/optimizer/plan/planner.c | 378 ++++++++++++++++++++------- src/backend/optimizer/prep/prepunion.c | 28 +- src/backend/optimizer/util/plancat.c | 30 --- src/test/regress/expected/partition_join.out | 4 +- 7 files changed, 416 insertions(+), 140 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index b5ed1b7939..53c479fbb8 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3933,16 +3933,6 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; <xref linkend="guc-enable-partition-pruning"/> setting. </para> - <note> - <para> - Currently, pruning of partitions during the planning of an - <command>UPDATE</command> or <command>DELETE</command> command is - implemented using the constraint exclusion method (however, it is - controlled by the <literal>enable_partition_pruning</literal> rather than - <literal>constraint_exclusion</literal>) — see the following section - for details and caveats that apply. - </para> - <para> Execution-time partition pruning currently occurs for the <literal>Append</literal> and <literal>MergeAppend</literal> node types. @@ -3964,9 +3954,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate >= DATE '2008-01-01'; <para> <firstterm>Constraint exclusion</firstterm> is a query optimization - technique similar to partition pruning. While it is primarily used - for partitioning implemented using the legacy inheritance method, it can be - used for other purposes, including with declarative partitioning. + technique similar to partition pruning. It is primarily used + for partitioning implemented using the legacy inheritance method. </para> <para> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 0e80aeb65c..5937c0436a 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -36,6 +36,7 @@ #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "optimizer/plancat.h" +#include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/prep.h" #include "optimizer/restrictinfo.h" @@ -119,6 +120,9 @@ static void set_namedtuplestore_pathlist(PlannerInfo *root, RelOptInfo *rel, static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); +static RelOptInfo *partitionwise_make_rel_from_joinlist(PlannerInfo *root, + RelOptInfo *parent, + List *joinlist); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, pushdown_safety_info *safetyInfo); static bool recurse_pushdown_safe(Node *setOp, Query *topquery, @@ -181,13 +185,30 @@ make_one_rel(PlannerInfo *root, List *joinlist) /* * Generate access paths for the entire join tree. + * + * If we're doing this for an UPDATE or DELETE query whose target is a + * partitioned table, we must do the join planning against each of its + * leaf partitions instead. */ - rel = make_rel_from_joinlist(root, joinlist); + if (root->parse->resultRelation && + root->parse->commandType != CMD_INSERT && + root->simple_rel_array[root->parse->resultRelation] && + root->simple_rel_array[root->parse->resultRelation]->part_scheme) + { + RelOptInfo *rootrel = root->simple_rel_array[root->parse->resultRelation]; - /* - * The result should join all and only the query's base rels. - */ - Assert(bms_equal(rel->relids, root->all_baserels)); + rel = partitionwise_make_rel_from_joinlist(root, rootrel, joinlist); + } + else + { + rel = make_rel_from_joinlist(root, joinlist); + + /* + * The result should join all and only the query's base rels. + */ + Assert(bms_equal(rel->relids, root->all_baserels)); + + } return rel; } @@ -2591,6 +2612,72 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) } /* + * partitionwise_make_rel_from_joinlist + * performs join planning against each of the leaf partitions contained + * in the partition tree whose root relation is 'parent' + * + * Recursively called for each partitioned table contained in a given + *partition tree. + */ +static RelOptInfo * +partitionwise_make_rel_from_joinlist(PlannerInfo *root, + RelOptInfo *parent, + List *joinlist) +{ + int i; + + Assert(root->parse->resultRelation != 0); + Assert(parent->part_scheme != NULL); + + for (i = 0; i < parent->nparts; i++) + { + RelOptInfo *partrel = parent->part_rels[i]; + AppendRelInfo *appinfo; + List *translated_joinlist; + List *saved_join_info_list = list_copy(root->join_info_list); + + /* Ignore pruned partitions. */ + if (IS_DUMMY_REL(partrel)) + continue; + + /* + * Hack to make the join planning code believe that 'partrel' can + * be joined against. + */ + partrel->reloptkind = RELOPT_BASEREL; + + /* + * Replace references to the parent rel in expressions relevant to join + * planning. + */ + appinfo = root->append_rel_array[partrel->relid]; + translated_joinlist = (List *) + adjust_appendrel_attrs(root, (Node *) joinlist, + 1, &appinfo); + root->join_info_list = (List *) + adjust_appendrel_attrs(root, + (Node *) root->join_info_list, + 1, &appinfo); + /* Reset join planning data structures for a new partition. */ + root->join_rel_list = NIL; + root->join_rel_hash = NULL; + + /* Recurse if the partition is itself a partitioned table. */ + if (partrel->part_scheme != NULL) + partrel = partitionwise_make_rel_from_joinlist(root, partrel, + translated_joinlist); + else + /* Perform the join planning and save the resulting relation. */ + parent->part_rels[i] = + make_rel_from_joinlist(root, translated_joinlist); + + root->join_info_list = saved_join_info_list; + } + + return parent; +} + +/* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index b05adc70c4..3f0d80eaa6 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -266,7 +266,9 @@ query_planner(PlannerInfo *root, List *tlist, /* Check that we got at least one usable path */ if (!final_rel || !final_rel->cheapest_total_path || - final_rel->cheapest_total_path->param_info != NULL) + final_rel->cheapest_total_path->param_info != NULL || + (final_rel->relid == root->parse->resultRelation && + root->parse->commandType == CMD_INSERT)) elog(ERROR, "failed to construct the join relation"); return final_rel; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 96bf0601a8..076dbd3d62 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -238,6 +238,16 @@ static bool group_by_has_partkey(RelOptInfo *input_rel, List *targetList, List *groupClause); +static void partitionwise_adjust_scanjoin_target(PlannerInfo *root, + RelOptInfo *parent, + List **partition_subroots, + List **partitioned_rels, + List **resultRelations, + List **subpaths, + List **WCOLists, + List **returningLists, + List **rowMarks); + /***************************************************************************** * @@ -959,7 +969,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse, * needs special processing, else go straight to grouping_planner. */ if (parse->resultRelation && - rt_fetch(parse->resultRelation, parse->rtable)->inh) + rt_fetch(parse->resultRelation, parse->rtable)->inh && + rt_fetch(parse->resultRelation, parse->rtable)->relkind != + RELKIND_PARTITIONED_TABLE) inheritance_planner(root); else grouping_planner(root, false, tuple_fraction); @@ -1688,6 +1700,14 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, RelOptInfo *current_rel; RelOptInfo *final_rel; ListCell *lc; + List *orig_parse_tlist = list_copy(parse->targetList); + List *partition_subroots = NIL; + List *partitioned_rels = NIL; + List *partition_resultRelations = NIL; + List *partition_subpaths = NIL; + List *partition_WCOLists = NIL; + List *partition_returningLists = NIL; + List *partition_rowMarks = NIL; /* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */ if (parse->limitCount || parse->limitOffset) @@ -2018,13 +2038,44 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, scanjoin_targets_contain_srfs = NIL; } - /* Apply scan/join target. */ - scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1 - && equal(scanjoin_target->exprs, current_rel->reltarget->exprs); - apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets, - scanjoin_targets_contain_srfs, - scanjoin_target_parallel_safe, - scanjoin_target_same_exprs); + /* + * For an UPDATE/DELETE query whose target is partitioned table, we + * must generate the targetlist for each of its leaf partitions and + * apply that. + */ + if (current_rel->reloptkind == RELOPT_BASEREL && + current_rel->part_scheme && + current_rel->relid == root->parse->resultRelation && + parse->commandType != CMD_INSERT) + { + /* + * scanjoin_target shouldn't have changed from final_target, + * because UPDATE/DELETE doesn't support various features that + * would've required modifications that are performed above. + * That's important because we'll generate final_target freshly + * for each partition in partitionwise_adjust_scanjoin_target. + */ + Assert(scanjoin_target == final_target); + root->parse->targetList = orig_parse_tlist; + partitionwise_adjust_scanjoin_target(root, current_rel, + &partition_subroots, + &partitioned_rels, + &partition_resultRelations, + &partition_subpaths, + &partition_WCOLists, + &partition_returningLists, + &partition_rowMarks); + } + else + { + /* Apply scan/join target. */ + scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1 + && equal(scanjoin_target->exprs, current_rel->reltarget->exprs); + apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets, + scanjoin_targets_contain_srfs, + scanjoin_target_parallel_safe, + scanjoin_target_same_exprs); + } /* * Save the various upper-rel PathTargets we just computed into @@ -2136,93 +2187,119 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, final_rel->useridiscurrent = current_rel->useridiscurrent; final_rel->fdwroutine = current_rel->fdwroutine; - /* - * Generate paths for the final_rel. Insert all surviving paths, with - * LockRows, Limit, and/or ModifyTable steps added if needed. - */ - foreach(lc, current_rel->pathlist) + if (current_rel->reloptkind == RELOPT_BASEREL && + current_rel->relid == root->parse->resultRelation && + current_rel->part_scheme && + parse->commandType != CMD_INSERT) { - Path *path = (Path *) lfirst(lc); - - /* - * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node. - * (Note: we intentionally test parse->rowMarks not root->rowMarks - * here. If there are only non-locking rowmarks, they should be - * handled by the ModifyTable node instead. However, root->rowMarks - * is what goes into the LockRows node.) - */ - if (parse->rowMarks) - { - path = (Path *) create_lockrows_path(root, final_rel, path, - root->rowMarks, - SS_assign_special_param(root)); - } - - /* - * If there is a LIMIT/OFFSET clause, add the LIMIT node. - */ - if (limit_needed(parse)) - { - path = (Path *) create_limit_path(root, final_rel, path, - parse->limitOffset, - parse->limitCount, - offset_est, count_est); - } - - /* - * If this is an INSERT/UPDATE/DELETE, and we're not being called from - * inheritance_planner, add the ModifyTable node. - */ - if (parse->commandType != CMD_SELECT && !inheritance_update) - { - List *withCheckOptionLists; - List *returningLists; - List *rowMarks; - - /* - * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, if - * needed. - */ - if (parse->withCheckOptions) - withCheckOptionLists = list_make1(parse->withCheckOptions); - else - withCheckOptionLists = NIL; - - if (parse->returningList) - returningLists = list_make1(parse->returningList); - else - returningLists = NIL; - - /* - * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node - * will have dealt with fetching non-locked marked rows, else we - * need to have ModifyTable do that. - */ - if (parse->rowMarks) - rowMarks = NIL; - else - rowMarks = root->rowMarks; - - path = (Path *) + Path *path = (Path *) create_modifytable_path(root, final_rel, parse->commandType, parse->canSetTag, parse->resultRelation, - NIL, - false, - list_make1_int(parse->resultRelation), - list_make1(path), - list_make1(root), - withCheckOptionLists, - returningLists, - rowMarks, - parse->onConflict, + partitioned_rels, + root->partColsUpdated, + partition_resultRelations, + partition_subpaths, + partition_subroots, + partition_WCOLists, + partition_returningLists, + partition_rowMarks, + NULL, SS_assign_special_param(root)); - } - - /* And shove it into final_rel */ add_path(final_rel, path); } + else + { + /* + * Generate paths for the final_rel. Insert all surviving paths, with + * LockRows, Limit, and/or ModifyTable steps added if needed. + */ + foreach(lc, current_rel->pathlist) + { + Path *path = (Path *) lfirst(lc); + + /* + * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows + * node. (Note: we intentionally test parse->rowMarks not + * root->rowMarks here. If there are only non-locking rowmarks, + * they should be handled by the ModifyTable node instead. + * However, root->rowMarks is what goes into the LockRows node.) + */ + if (parse->rowMarks) + { + path = (Path *) + create_lockrows_path(root, final_rel, path, + root->rowMarks, + SS_assign_special_param(root)); + } + + /* + * If there is a LIMIT/OFFSET clause, add the LIMIT node. + */ + if (limit_needed(parse)) + { + path = (Path *) create_limit_path(root, final_rel, path, + parse->limitOffset, + parse->limitCount, + offset_est, count_est); + } + + /* + * If this is an INSERT/UPDATE/DELETE, and we're not being called + * from inheritance_planner, add the ModifyTable node. + */ + if (parse->commandType != CMD_SELECT && !inheritance_update) + { + List *withCheckOptionLists; + List *returningLists; + List *rowMarks; + + /* + * Set up the WITH CHECK OPTION and RETURNING lists-of-lists, + * if needed. + */ + if (parse->withCheckOptions) + withCheckOptionLists = list_make1(parse->withCheckOptions); + else + withCheckOptionLists = NIL; + + if (parse->returningList) + returningLists = list_make1(parse->returningList); + else + returningLists = NIL; + + /* + * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows + * node will have dealt with fetching non-locked marked rows, + * else we need to have ModifyTable do that. + */ + if (parse->rowMarks) + rowMarks = NIL; + else + rowMarks = root->rowMarks; + + path = (Path *) + create_modifytable_path(root, final_rel, + parse->commandType, + parse->canSetTag, + parse->resultRelation, + NIL, + false, + list_make1_int(parse->resultRelation), + list_make1(path), + list_make1(root), + withCheckOptionLists, + returningLists, + rowMarks, + parse->onConflict, + SS_assign_special_param(root)); + } + + /* And shove it into final_rel */ + add_path(final_rel, path); + } + } /* * Generate partial paths for final_rel, too, if outer query levels might @@ -2259,6 +2336,129 @@ grouping_planner(PlannerInfo *root, bool inheritance_update, } /* + * partitionwise_adjust_scanjoin_target + * adjusts query's targetlist for each partition in the partition tree + * whose root is 'parent' and apply it to their paths via + * apply_scanjoin_target_to_paths + * + * Its output also consists of various pieces of information that will go + * into the ModifyTable node that will be created for this query. + */ +static void +partitionwise_adjust_scanjoin_target(PlannerInfo *root, + RelOptInfo *parent, + List **subroots, + List **partitioned_rels, + List **resultRelations, + List **subpaths, + List **WCOLists, + List **returningLists, + List **rowMarks) +{ + Query *parse = root->parse; + int i; + + *partitioned_rels = lappend(*partitioned_rels, + list_make1_int(parent->relid)); + + for (i = 0; i < parent->nparts; i++) + { + RelOptInfo *child_rel = parent->part_rels[i]; + AppendRelInfo *appinfo; + int relid; + List *tlist; + PathTarget *scanjoin_target; + bool scanjoin_target_parallel_safe; + bool scanjoin_target_same_exprs; + PlannerInfo *partition_subroot; + Query *partition_parse; + + /* Ignore pruned partitions. */ + if (IS_DUMMY_REL(child_rel)) + continue; + + /* + * Extract the original relid of partition to fetch its AppendRelInfo. + * We must find it like this, because + * partitionwise_make_rel_from_joinlist replaces the original rel + * with one generated by join planning which may be different. + */ + relid = -1; + while ((relid = bms_next_member(child_rel->relids, relid)) > 0) + if (root->append_rel_array[relid] && + root->append_rel_array[relid]->parent_relid == + parent->relid) + break; + + appinfo = root->append_rel_array[relid]; + + /* Translate Query structure for this partition. */ + partition_parse = (Query *) + adjust_appendrel_attrs(root, + (Node *) parse, + 1, &appinfo); + + /* Recurse if partition is itself a partitioned table. */ + if (child_rel->part_scheme) + { + root->parse = partition_parse; + partitionwise_adjust_scanjoin_target(root, child_rel, + subroots, + partitioned_rels, + resultRelations, + subpaths, + WCOLists, + returningLists, + rowMarks); + /* Restore the Query for processing the next partition. */ + root->parse = parse; + } + else + { + /* + * Generate a separate PlannerInfo for this partition. We'll need + * it when generating the ModifyTable subplan for this partition. + */ + partition_subroot = makeNode(PlannerInfo); + *subroots = lappend(*subroots, partition_subroot); + memcpy(partition_subroot, root, sizeof(PlannerInfo)); + partition_subroot->parse = partition_parse; + + /* + * Preprocess the translated targetlist and save it in the + * partition's PlannerInfo for the perusal of later planning + * steps. + */ + tlist = preprocess_targetlist(partition_subroot); + partition_subroot->processed_tlist = tlist; + + /* Apply scan/join target. */ + scanjoin_target = create_pathtarget(root, tlist); + scanjoin_target_same_exprs = equal(scanjoin_target->exprs, + child_rel->reltarget->exprs); + scanjoin_target_parallel_safe = + is_parallel_safe(root, (Node *) scanjoin_target->exprs); + apply_scanjoin_target_to_paths(root, child_rel, + list_make1(scanjoin_target), + NIL, + scanjoin_target_parallel_safe, + scanjoin_target_same_exprs); + + /* Collect information that will go into the ModifyTable */ + *resultRelations = lappend_int(*resultRelations, relid); + *subpaths = lappend(*subpaths, child_rel->cheapest_total_path); + if (partition_parse->withCheckOptions) + *WCOLists = lappend(*WCOLists, partition_parse->withCheckOptions); + if (partition_parse->returningList) + *returningLists = lappend(*returningLists, + partition_parse->returningList); + if (partition_parse->rowMarks) + *rowMarks = lappend(*rowMarks, partition_parse->rowMarks); + } + } +} + +/* * Do preprocessing for groupingSets clause and related data. This handles the * preliminary steps of expanding the grouping sets, organizing them into lists * of rollups, and preparing annotations which will later be filled in with @@ -6964,7 +7164,9 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, } /* Build new paths for this relation by appending child paths. */ - if (live_children != NIL) + if (live_children != NIL && + !(rel->reloptkind == RELOPT_BASEREL && + rel->relid == root->parse->resultRelation)) add_paths_to_append_rel(root, rel, live_children); } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 690b6bbab7..f4c485cdc9 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -2265,8 +2265,34 @@ adjust_appendrel_attrs_mutator(Node *node, context->appinfos); return (Node *) phv; } + + if (IsA(node, SpecialJoinInfo)) + { + SpecialJoinInfo *oldinfo = (SpecialJoinInfo *) node; + SpecialJoinInfo *newinfo = makeNode(SpecialJoinInfo); + + memcpy(newinfo, oldinfo, sizeof(SpecialJoinInfo)); + newinfo->min_lefthand = adjust_child_relids(oldinfo->min_lefthand, + context->nappinfos, + context->appinfos); + newinfo->min_righthand = adjust_child_relids(oldinfo->min_righthand, + context->nappinfos, + context->appinfos); + newinfo->syn_lefthand = adjust_child_relids(oldinfo->syn_lefthand, + context->nappinfos, + context->appinfos); + newinfo->syn_righthand = adjust_child_relids(oldinfo->syn_righthand, + context->nappinfos, + context->appinfos); + newinfo->semi_rhs_exprs = + (List *) expression_tree_mutator((Node *) + oldinfo->semi_rhs_exprs, + adjust_appendrel_attrs_mutator, + (void *) context); + return (Node *) newinfo; + } + /* Shouldn't need to handle planner auxiliary nodes here */ - Assert(!IsA(node, SpecialJoinInfo)); Assert(!IsA(node, AppendRelInfo)); Assert(!IsA(node, PlaceHolderInfo)); Assert(!IsA(node, MinMaxAggInfo)); diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8369e3ad62..8d67f21f42 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1265,36 +1265,6 @@ get_relation_constraints(PlannerInfo *root, } } - /* - * Append partition predicates, if any. - * - * For selects, partition pruning uses the parent table's partition bound - * descriptor, instead of constraint exclusion which is driven by the - * individual partition's partition constraint. - */ - if (enable_partition_pruning && root->parse->commandType != CMD_SELECT) - { - List *pcqual = RelationGetPartitionQual(relation); - - if (pcqual) - { - /* - * Run the partition quals through const-simplification similar to - * check constraints. We skip canonicalize_qual, though, because - * partition quals should be in canonical form already; also, - * since the qual is in implicit-AND format, we'd have to - * explicitly convert it to explicit-AND format and back again. - */ - pcqual = (List *) eval_const_expressions(root, (Node *) pcqual); - - /* Fix Vars to have the desired varno */ - if (varno != 1) - ChangeVarNodes((Node *) pcqual, 1, varno, 0); - - result = list_concat(result, pcqual); - } - } - heap_close(relation, NoLock); return result; diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out index 7d04d12c6e..9074182512 100644 --- a/src/test/regress/expected/partition_join.out +++ b/src/test/regress/expected/partition_join.out @@ -1752,7 +1752,7 @@ WHERE EXISTS ( Filter: (c IS NULL) -> Nested Loop -> Seq Scan on int4_tbl - -> Subquery Scan on ss_1 + -> Subquery Scan on ss -> Limit -> Seq Scan on int8_tbl int8_tbl_1 -> Nested Loop Semi Join @@ -1760,7 +1760,7 @@ WHERE EXISTS ( Filter: (c IS NULL) -> Nested Loop -> Seq Scan on int4_tbl - -> Subquery Scan on ss_2 + -> Subquery Scan on ss -> Limit -> Seq Scan on int8_tbl int8_tbl_2 (28 rows) -- 2.11.0
From 023ce5b76136c3a53bbf6401599d40362b2de719 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 16 May 2018 14:35:40 +0900 Subject: [PATCH v2 2/3] Lazy creation of partition objects for planning With the current approach, *all* partitions are opened and range table entries are created for them in the planner's prep phase, which is much sooner than when partition pruning is performed. This means that query_planner ends up spending cycles and memory on many partitions that potentially won't be included in the plan, such as creating RelOptInfos, AppendRelInfos. To avoid that, add partition range table entries and other planning data structures for only partitions that remain after applying partition pruning. Some code like that of partitionwise join rely on the fact that even though partitions may have been pruned, they would still have a RelOptInfo, albeit marked dummy to handle the outer join case where the pruned partition appears on the nullable side of join. So this commit also teaches the partitionwise join code to allocate dummy RelOptInfos for pruned partitions. There are couple of regression test diffs caused by the fact that we no longer allocate a duplicate RT entry for a partitioned table in its role as child and also that the individual partition RT entries are now created in the order in which their parent's are processed whereas previously they'd be added to the range table in the order of depth-first expansion of the tree. --- src/backend/optimizer/path/allpaths.c | 65 +++-- src/backend/optimizer/path/joinrels.c | 8 + src/backend/optimizer/plan/initsplan.c | 66 +++++ src/backend/optimizer/plan/planmain.c | 30 --- src/backend/optimizer/plan/planner.c | 19 +- src/backend/optimizer/prep/prepunion.c | 314 +++++++++------------- src/backend/optimizer/util/plancat.c | 16 +- src/backend/optimizer/util/relnode.c | 172 ++++++++++-- src/backend/partitioning/partprune.c | 109 ++++---- src/include/nodes/relation.h | 5 + src/include/optimizer/pathnode.h | 6 + src/include/optimizer/plancat.h | 2 +- src/include/optimizer/planmain.h | 3 + src/include/optimizer/prep.h | 10 + src/include/partitioning/partprune.h | 2 +- src/test/regress/expected/join.out | 22 +- src/test/regress/expected/partition_aggregate.out | 4 +- 17 files changed, 507 insertions(+), 346 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 5937c0436a..9abaab25fa 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -151,6 +151,7 @@ make_one_rel(PlannerInfo *root, List *joinlist) { RelOptInfo *rel; Index rti; + double total_pages; /* * Construct the all_baserels Relids set. @@ -181,6 +182,35 @@ make_one_rel(PlannerInfo *root, List *joinlist) * then generate access paths. */ set_base_rel_sizes(root); + + /* + * We should now have size estimates for every actual table involved in + * the query, and we also know which if any have been deleted from the + * query by join removal; so we can compute total_table_pages. + * + * Note that appendrels are not double-counted here, even though we don't + * bother to distinguish RelOptInfos for appendrel parents, because the + * parents will still have size zero. + * + * XXX if a table is self-joined, we will count it once per appearance, + * which perhaps is the wrong thing ... but that's not completely clear, + * and detecting self-joins here is difficult, so ignore it for now. + */ + total_pages = 0; + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *brel = root->simple_rel_array[rti]; + + if (brel == NULL) + continue; + + Assert(brel->relid == rti); /* sanity check on array */ + + if (IS_SIMPLE_REL(brel)) + total_pages += (double) brel->pages; + } + root->total_table_pages = total_pages; + set_base_rel_pathlists(root); /* @@ -896,8 +926,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, double *parent_attrsizes; int nattrs; ListCell *l; - Relids live_children = NULL; - bool did_pruning = false; /* Guard against stack overflow due to overly deep inheritance tree. */ check_stack_depth(); @@ -913,21 +941,14 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, * partitioned table's list will contain all such indexes. */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { rel->partitioned_child_rels = list_make1_int(rti); - /* - * If the partitioned relation has any baserestrictinfo quals then we - * attempt to use these quals to prune away partitions that cannot - * possibly contain any tuples matching these quals. In this case we'll - * store the relids of all partitions which could possibly contain a - * matching tuple, and skip anything else in the loop below. - */ - if (enable_partition_pruning && - rte->relkind == RELKIND_PARTITIONED_TABLE && - rel->baserestrictinfo != NIL) - { - live_children = prune_append_rel_partitions(rel); - did_pruning = true; + /* + * And do prunin. Note that this adds AppendRelInfo's of only the + * partitions that are not pruned. + */ + prune_append_rel_partitions(root, rel); } /* @@ -1178,13 +1199,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, continue; } - if (did_pruning && !bms_is_member(appinfo->child_relid, live_children)) - { - /* This partition was pruned; skip it. */ - set_dummy_rel_pathlist(childrel); - continue; - } - if (relation_excluded_by_constraints(root, childrel, childRTE)) { /* @@ -2629,16 +2643,15 @@ partitionwise_make_rel_from_joinlist(PlannerInfo *root, Assert(root->parse->resultRelation != 0); Assert(parent->part_scheme != NULL); - for (i = 0; i < parent->nparts; i++) + i = -1; + while ((i = bms_next_member(parent->live_parts, i)) >= 0) { RelOptInfo *partrel = parent->part_rels[i]; AppendRelInfo *appinfo; List *translated_joinlist; List *saved_join_info_list = list_copy(root->join_info_list); - /* Ignore pruned partitions. */ - if (IS_DUMMY_REL(partrel)) - continue; + Assert (partrel != NULL); /* * Hack to make the join planning code believe that 'partrel' can diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 7008e1318e..8542b95f4b 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -1369,6 +1369,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, AppendRelInfo **appinfos; int nappinfos; + if (child_rel1 == NULL) + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts); + if (child_rel2 == NULL) + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts); + /* We should never try to join two overlapping sets of rels. */ Assert(!bms_overlap(child_rel1->relids, child_rel2->relids)); child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids); @@ -1407,6 +1412,9 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, populate_joinrel_with_paths(root, child_rel1, child_rel2, child_joinrel, child_sjinfo, child_restrictlist); + if (!IS_DUMMY_REL(child_joinrel)) + joinrel->live_parts = bms_add_member(joinrel->live_parts, + cnt_parts); } } diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 01335db511..beb3e95101 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -132,6 +132,72 @@ add_base_rels_to_query(PlannerInfo *root, Node *jtnode) (int) nodeTag(jtnode)); } +/* + * add_rel_partitions_to_query + * create range table entries and "otherrel" RelOptInfos and for the + * partitions of 'rel' specified by the caller + * + * To store the objects thus created, various arrays in 'root' are expanded + * by repalloc'ing them. + */ +void +add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel, + bool scan_all_parts, + Bitmapset *partindexes) +{ + int new_size; + int num_added_parts; + int i; + + Assert(partindexes != NULL || scan_all_parts); + + /* Expand the PlannerInfo arrays to hold new partition objects. */ + num_added_parts = scan_all_parts ? rel->nparts : + bms_num_members(partindexes); + new_size = root->simple_rel_array_size + num_added_parts; + root->simple_rte_array = (RangeTblEntry **) + repalloc(root->simple_rte_array, + sizeof(RangeTblEntry *) * new_size); + root->simple_rel_array = (RelOptInfo **) + repalloc(root->simple_rel_array, + sizeof(RelOptInfo *) * new_size); + if (root->append_rel_array) + root->append_rel_array = (AppendRelInfo **) + repalloc(root->append_rel_array, + sizeof(AppendRelInfo *) * new_size); + else + root->append_rel_array = (AppendRelInfo **) + palloc0(sizeof(AppendRelInfo *) * + new_size); + + /* Set the contents of just allocated memory to 0. */ + MemSet(root->simple_rte_array + root->simple_rel_array_size, + 0, sizeof(RangeTblEntry *) * num_added_parts); + MemSet(root->simple_rel_array + root->simple_rel_array_size, + 0, sizeof(RelOptInfo *) * num_added_parts); + MemSet(root->append_rel_array + root->simple_rel_array_size, + 0, sizeof(AppendRelInfo *) * num_added_parts); + root->simple_rel_array_size = new_size; + + /* And add the partitions. */ + if (scan_all_parts) + { + for (i = 0; i < rel->nparts; i++) + { + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + rel->live_parts = bms_add_member(rel->live_parts, i); + } + } + else + { + rel->live_parts = partindexes; + i = -1; + while ((i = bms_next_member(partindexes, i)) >= 0) + rel->part_rels[i] = build_partition_rel(root, rel, + rel->part_oids[i]); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 3f0d80eaa6..1bd3f0e350 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -57,8 +57,6 @@ query_planner(PlannerInfo *root, List *tlist, Query *parse = root->parse; List *joinlist; RelOptInfo *final_rel; - Index rti; - double total_pages; /* * If the query has an empty join tree, then it's something easy like @@ -232,34 +230,6 @@ query_planner(PlannerInfo *root, List *tlist, extract_restriction_or_clauses(root); /* - * We should now have size estimates for every actual table involved in - * the query, and we also know which if any have been deleted from the - * query by join removal; so we can compute total_table_pages. - * - * Note that appendrels are not double-counted here, even though we don't - * bother to distinguish RelOptInfos for appendrel parents, because the - * parents will still have size zero. - * - * XXX if a table is self-joined, we will count it once per appearance, - * which perhaps is the wrong thing ... but that's not completely clear, - * and detecting self-joins here is difficult, so ignore it for now. - */ - total_pages = 0; - for (rti = 1; rti < root->simple_rel_array_size; rti++) - { - RelOptInfo *brel = root->simple_rel_array[rti]; - - if (brel == NULL) - continue; - - Assert(brel->relid == rti); /* sanity check on array */ - - if (IS_SIMPLE_REL(brel)) - total_pages += (double) brel->pages; - } - root->total_table_pages = total_pages; - - /* * Ready to do the primary planning. */ final_rel = make_one_rel(root, joinlist); diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 076dbd3d62..cce6757115 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -2361,7 +2361,8 @@ partitionwise_adjust_scanjoin_target(PlannerInfo *root, *partitioned_rels = lappend(*partitioned_rels, list_make1_int(parent->relid)); - for (i = 0; i < parent->nparts; i++) + i = -1; + while ((i = bms_next_member(parent->live_parts, i)) >= 0) { RelOptInfo *child_rel = parent->part_rels[i]; AppendRelInfo *appinfo; @@ -2373,9 +2374,7 @@ partitionwise_adjust_scanjoin_target(PlannerInfo *root, PlannerInfo *partition_subroot; Query *partition_parse; - /* Ignore pruned partitions. */ - if (IS_DUMMY_REL(child_rel)) - continue; + Assert (child_rel != NULL); /* * Extract the original relid of partition to fetch its AppendRelInfo. @@ -7122,18 +7121,21 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, */ if (rel->part_scheme && rel->part_rels) { - int partition_idx; + int i; List *live_children = NIL; /* Adjust each partition. */ - for (partition_idx = 0; partition_idx < rel->nparts; partition_idx++) + i = -1; + while ((i = bms_next_member(rel->live_parts, i)) >= 0) { - RelOptInfo *child_rel = rel->part_rels[partition_idx]; + RelOptInfo *child_rel = rel->part_rels[i]; ListCell *lc; AppendRelInfo **appinfos; int nappinfos; List *child_scanjoin_targets = NIL; + Assert(child_rel != NULL); + /* Translate scan/join targets for this child. */ appinfos = find_appinfos_by_relids(root, child_rel->relids, &nappinfos); @@ -7237,6 +7239,9 @@ create_partitionwise_grouping_paths(PlannerInfo *root, RelOptInfo *child_grouped_rel; RelOptInfo *child_partially_grouped_rel; + if (child_input_rel == NULL) + continue; + /* Input child rel must have a path */ Assert(child_input_rel->pathlist != NIL); diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index f4c485cdc9..279f686fb0 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -49,6 +49,8 @@ #include "parser/parse_coerce.h" #include "parser/parsetree.h" #include "utils/lsyscache.h" +#include "utils/lsyscache.h" +#include "utils/partcache.h" #include "utils/rel.h" #include "utils/selfuncs.h" #include "utils/syscache.h" @@ -101,21 +103,10 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti); -static void expand_partitioned_rtentry(PlannerInfo *root, - RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, LOCKMODE lockmode, - List **appinfos); -static void expand_single_inheritance_child(PlannerInfo *root, - RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, Relation childrel, - List **appinfos, RangeTblEntry **childrte_p, - Index *childRTindex_p); -static void make_inh_translation_list(Relation oldrelation, - Relation newrelation, - Index newvarno, - List **translated_vars); +static void make_inh_translation_list(TupleDesc old_tupdesc, + TupleDesc new_tupdesc, + RangeTblEntry *oldrte, RangeTblEntry *newrte, + Index newvarno, List **translated_vars); static Bitmapset *translate_col_privs(const Bitmapset *parent_privs, List *translated_vars); static Node *adjust_appendrel_attrs_mutator(Node *node, @@ -1522,6 +1513,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) LOCKMODE lockmode; List *inhOIDs; ListCell *l; + List *appinfos = NIL; /* Does RT entry allow inheritance? */ if (!rte->inh) @@ -1585,173 +1577,58 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) if (oldrc) oldrc->isParent = true; + /* Partitioned tables are expanded elsewhere. */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { + list_free(inhOIDs); + return; + } + /* * Must open the parent relation to examine its tupdesc. We need not lock * it; we assume the rewriter already did. */ oldrelation = heap_open(parentOID, NoLock); - /* Scan the inheritance set and expand it */ - if (RelationGetPartitionDesc(oldrelation) != NULL) + foreach(l, inhOIDs) { - Assert(rte->relkind == RELKIND_PARTITIONED_TABLE); + Oid childOID = lfirst_oid(l); + Index childRTindex = 0; + RangeTblEntry *childrte = NULL; + AppendRelInfo *appinfo = NULL; - /* - * If this table has partitions, recursively expand them in the order - * in which they appear in the PartitionDesc. While at it, also - * extract the partition key columns of all the partitioned tables. - */ - expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc, - lockmode, &root->append_rel_list); + add_inheritance_child_to_query(root, rte, rti, + oldrelation->rd_rel->reltype, + RelationGetDescr(oldrelation), + oldrc, childOID, NoLock, + &appinfo, &childrte, + &childRTindex); + Assert(childRTindex > 1); + Assert(childrte != NULL); + Assert(appinfo != NULL); + appinfos = lappend(appinfos, appinfo); } + + /* + * If all the children were temp tables, pretend it's a + * non-inheritance situation; we don't need Append node in that case. + * The duplicate RTE we added for the parent table is harmless, so we + * don't bother to get rid of it; ditto for the useless PlanRowMark + * node. + */ + if (list_length(appinfos) < 2) + rte->inh = false; else - { - List *appinfos = NIL; - RangeTblEntry *childrte; - Index childRTindex; - - /* - * This table has no partitions. Expand any plain inheritance - * children in the order the OIDs were returned by - * find_all_inheritors. - */ - foreach(l, inhOIDs) - { - Oid childOID = lfirst_oid(l); - Relation newrelation; - - /* Open rel if needed; we already have required locks */ - if (childOID != parentOID) - newrelation = heap_open(childOID, NoLock); - else - newrelation = oldrelation; - - /* - * It is possible that the parent table has children that are temp - * tables of other backends. We cannot safely access such tables - * (because of buffering issues), and the best thing to do seems - * to be to silently ignore them. - */ - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation)) - { - heap_close(newrelation, lockmode); - continue; - } - - expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc, - newrelation, - &appinfos, &childrte, - &childRTindex); - - /* Close child relations, but keep locks */ - if (childOID != parentOID) - heap_close(newrelation, NoLock); - } - - /* - * If all the children were temp tables, pretend it's a - * non-inheritance situation; we don't need Append node in that case. - * The duplicate RTE we added for the parent table is harmless, so we - * don't bother to get rid of it; ditto for the useless PlanRowMark - * node. - */ - if (list_length(appinfos) < 2) - rte->inh = false; - else - root->append_rel_list = list_concat(root->append_rel_list, - appinfos); - - } + root->append_rel_list = list_concat(root->append_rel_list, + appinfos); heap_close(oldrelation, NoLock); } /* - * expand_partitioned_rtentry - * Recursively expand an RTE for a partitioned table. - * - * Note that RelationGetPartitionDispatchInfo will expand partitions in the - * same order as this code. - */ -static void -expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, LOCKMODE lockmode, - List **appinfos) -{ - int i; - RangeTblEntry *childrte; - Index childRTindex; - PartitionDesc partdesc = RelationGetPartitionDesc(parentrel); - - check_stack_depth(); - - /* A partitioned table should always have a partition descriptor. */ - Assert(partdesc); - - Assert(parentrte->inh); - - /* - * Note down whether any partition key cols are being updated. Though it's - * the root partitioned table's updatedCols we are interested in, we - * instead use parentrte to get the updatedCols. This is convenient - * because parentrte already has the root partrel's updatedCols translated - * to match the attribute ordering of parentrel. - */ - if (!root->partColsUpdated) - root->partColsUpdated = - has_partition_attrs(parentrel, parentrte->updatedCols, NULL); - - /* First expand the partitioned table itself. */ - expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel, - top_parentrc, parentrel, - appinfos, &childrte, &childRTindex); - - /* - * If the partitioned table has no partitions, treat this as the - * non-inheritance case. - */ - if (partdesc->nparts == 0) - { - parentrte->inh = false; - return; - } - - for (i = 0; i < partdesc->nparts; i++) - { - Oid childOID = partdesc->oids[i]; - Relation childrel; - - /* Open rel; we already have required locks */ - childrel = heap_open(childOID, NoLock); - - /* - * Temporary partitions belonging to other sessions should have been - * disallowed at definition, but for paranoia's sake, let's double - * check. - */ - if (RELATION_IS_OTHER_TEMP(childrel)) - elog(ERROR, "temporary relation from another session found as partition"); - - expand_single_inheritance_child(root, parentrte, parentRTindex, - parentrel, top_parentrc, childrel, - appinfos, &childrte, &childRTindex); - - /* If this child is itself partitioned, recurse */ - if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - expand_partitioned_rtentry(root, childrte, childRTindex, - childrel, top_parentrc, lockmode, - appinfos); - - /* Close child relation, but keep locks */ - heap_close(childrel, NoLock); - } -} - -/* - * expand_single_inheritance_child + * add_inheritance_child_to_query * Build a RangeTblEntry and an AppendRelInfo, if appropriate, plus - * maybe a PlanRowMark. + * maybe a PlanRowMark for a child relation. * * We now expand the partition hierarchy level by level, creating a * corresponding hierarchy of AppendRelInfos and RelOptInfos, where each @@ -1769,19 +1646,70 @@ expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte, * The child RangeTblEntry and its RTI are returned in "childrte_p" and * "childRTindex_p" resp. */ -static void -expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, - Index parentRTindex, Relation parentrel, - PlanRowMark *top_parentrc, Relation childrel, - List **appinfos, RangeTblEntry **childrte_p, - Index *childRTindex_p) +void +add_inheritance_child_to_query(PlannerInfo *root, RangeTblEntry *parentrte, + Index parentRTindex, Oid parentRelType, + TupleDesc parentDesc, + PlanRowMark *top_parentrc, + Oid childOID, int lockmode, + AppendRelInfo **appinfo_p, + RangeTblEntry **childrte_p, + Index *childRTindex_p) { Query *parse = root->parse; - Oid parentOID = RelationGetRelid(parentrel); - Oid childOID = RelationGetRelid(childrel); + Oid parentOID = parentrte->relid; RangeTblEntry *childrte; Index childRTindex; AppendRelInfo *appinfo; + Relation childrel = NULL; + char child_relkind; + Oid child_reltype; + TupleDesc childDesc; + + *appinfo_p = NULL; + *childrte_p = NULL; + *childRTindex_p = 0; + + /* Open rel if needed; we already have required locks */ + if (childOID != parentOID) + { + childrel = heap_open(childOID, lockmode); + + /* + * Temporary partitions belonging to other sessions should have been + * disallowed at definition, but for paranoia's sake, let's double + * check. + */ + if (RELATION_IS_OTHER_TEMP(childrel)) + { + if (childrel->rd_rel->relispartition) + elog(ERROR, "temporary relation from another session found as partition"); + heap_close(childrel, lockmode); + return; + } + + child_relkind = childrel->rd_rel->relkind; + + /* + * No point in adding to the query a partitioned table that has no + * partitions. + */ + if (child_relkind == RELKIND_PARTITIONED_TABLE && + RelationGetPartitionDesc(childrel)->nparts == 0) + { + heap_close(childrel, lockmode); + return; + } + + child_reltype = childrel->rd_rel->reltype; + childDesc = RelationGetDescr(childrel); + } + else + { + child_relkind = parentrte->relkind; + child_reltype = parentRelType; + childDesc = parentDesc; + } /* * Build an RTE for the child, and attach to query's rangetable list. We @@ -1798,7 +1726,7 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, childrte = copyObject(parentrte); *childrte_p = childrte; childrte->relid = childOID; - childrte->relkind = childrel->rd_rel->relkind; + childrte->relkind = child_relkind; /* A partitioned child will need to be expanded further. */ if (childOID != parentOID && childrte->relkind == RELKIND_PARTITIONED_TABLE) @@ -1823,12 +1751,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, appinfo = makeNode(AppendRelInfo); appinfo->parent_relid = parentRTindex; appinfo->child_relid = childRTindex; - appinfo->parent_reltype = parentrel->rd_rel->reltype; - appinfo->child_reltype = childrel->rd_rel->reltype; - make_inh_translation_list(parentrel, childrel, childRTindex, + appinfo->parent_reltype = parentRelType; + appinfo->child_reltype = child_reltype; + make_inh_translation_list(parentDesc, childDesc, + parentrte, childrte, childRTindex, &appinfo->translated_vars); appinfo->parent_reloid = parentOID; - *appinfos = lappend(*appinfos, appinfo); + *appinfo_p = appinfo; /* * Translate the column permissions bitmaps to the child's attnums (we @@ -1879,6 +1808,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, root->rowMarks = lappend(root->rowMarks, childrc); } + + /* Close child relations, but keep locks */ + if (childOID != parentOID) + { + Assert(childrel != NULL); + heap_close(childrel, lockmode); + } } /* @@ -1889,14 +1825,12 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte, * For paranoia's sake, we match type/collation as well as attribute name. */ static void -make_inh_translation_list(Relation oldrelation, Relation newrelation, - Index newvarno, - List **translated_vars) +make_inh_translation_list(TupleDesc old_tupdesc, TupleDesc new_tupdesc, + RangeTblEntry *oldrte, RangeTblEntry *newrte, + Index newvarno, List **translated_vars) { List *vars = NIL; - TupleDesc old_tupdesc = RelationGetDescr(oldrelation); - TupleDesc new_tupdesc = RelationGetDescr(newrelation); - Oid new_relid = RelationGetRelid(newrelation); + Oid new_relid = newrte->relid; int oldnatts = old_tupdesc->natts; int newnatts = new_tupdesc->natts; int old_attno; @@ -1926,7 +1860,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, * When we are generating the "translation list" for the parent table * of an inheritance set, no need to search for matches. */ - if (oldrelation == newrelation) + if (oldrte->relid == newrte->relid) { vars = lappend(vars, makeVar(newvarno, (AttrNumber) (old_attno + 1), @@ -1955,7 +1889,7 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, newtup = SearchSysCacheAttName(new_relid, attname); if (!newtup) elog(ERROR, "could not find inherited attribute \"%s\" of relation \"%s\"", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); new_attno = ((Form_pg_attribute) GETSTRUCT(newtup))->attnum - 1; ReleaseSysCache(newtup); @@ -1965,10 +1899,10 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, /* Found it, check type and collation match */ if (atttypid != att->atttypid || atttypmod != att->atttypmod) elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's type", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); if (attcollation != att->attcollation) elog(ERROR, "attribute \"%s\" of relation \"%s\" does not match parent's collation", - attname, RelationGetRelationName(newrelation)); + attname, get_rel_name(newrte->relid)); vars = lappend(vars, makeVar(newvarno, (AttrNumber) (new_attno + 1), @@ -2121,7 +2055,7 @@ adjust_appendrel_attrs_mutator(Node *node, } } - if (var->varlevelsup == 0 && appinfo) + if (var->varlevelsup == 0 && appinfo && appinfo->translated_vars) { var->varno = appinfo->child_relid; var->varnoold = appinfo->child_relid; diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 8d67f21f42..f93cc6b90d 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -106,7 +106,7 @@ static void set_baserel_partition_key_exprs(Relation relation, */ void get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, - RelOptInfo *rel) + Bitmapset *updatedCols, RelOptInfo *rel) { Index varno = rel->relid; Relation relation; @@ -449,7 +449,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent, * inheritance parents may be partitioned. */ if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { set_relation_partition_info(root, rel, relation); + if (!root->partColsUpdated) + root->partColsUpdated = + has_partition_attrs(relation, updatedCols, NULL); + } + + rel->tupdesc = RelationGetDescr(relation); + rel->reltype = RelationGetForm(relation)->reltype; heap_close(relation, NoLock); @@ -1871,18 +1879,18 @@ set_relation_partition_info(PlannerInfo *root, RelOptInfo *rel, Relation relation) { PartitionDesc partdesc; - PartitionKey partkey; Assert(relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); partdesc = RelationGetPartitionDesc(relation); - partkey = RelationGetPartitionKey(relation); rel->part_scheme = find_partition_scheme(root, relation); Assert(partdesc != NULL && rel->part_scheme != NULL); - rel->boundinfo = partition_bounds_copy(partdesc->boundinfo, partkey); + rel->boundinfo = partdesc->boundinfo; rel->nparts = partdesc->nparts; set_baserel_partition_key_exprs(relation, rel); rel->partition_qual = RelationGetPartitionQual(relation); + rel->part_oids = (Oid *) palloc(rel->nparts * sizeof(Oid)); + memcpy(rel->part_oids, partdesc->oids, rel->nparts * sizeof(Oid)); } /* diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index c69740eda6..46ecb52166 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_class.h" #include "miscadmin.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" @@ -27,6 +28,7 @@ #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" #include "partitioning/partbounds.h" +#include "storage/lockdefs.h" #include "utils/hsearch.h" @@ -137,6 +139,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) /* Rel should not exist already */ Assert(relid > 0 && relid < root->simple_rel_array_size); + if (root->simple_rel_array[relid] != NULL) elog(ERROR, "rel %d already exists", relid); @@ -218,7 +221,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) { case RTE_RELATION: /* Table --- retrieve statistics from the system catalogs */ - get_relation_info(root, rte->relid, rte->inh, rel); + get_relation_info(root, rte->relid, rte->inh, rte->updatedCols, + rel); break; case RTE_SUBQUERY: case RTE_FUNCTION: @@ -268,41 +272,30 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) if (rte->inh) { ListCell *l; - int nparts = rel->nparts; - int cnt_parts = 0; - if (nparts > 0) + /* + * For partitioned tables, we just allocate space for RelOptInfo's. + * pointers for all partitions and copy the partition OIDs from the + * relcache. Actual RelOptInfo is built for a partition only if it is + * not pruned. + */ + if (rte->relkind == RELKIND_PARTITIONED_TABLE) + { rel->part_rels = (RelOptInfo **) - palloc(sizeof(RelOptInfo *) * nparts); + palloc0(sizeof(RelOptInfo *) * rel->nparts); + return rel; + } foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l); - RelOptInfo *childrel; /* append_rel_list contains all append rels; ignore others */ if (appinfo->parent_relid != relid) continue; - childrel = build_simple_rel(root, appinfo->child_relid, - rel); - - /* Nothing more to do for an unpartitioned table. */ - if (!rel->part_scheme) - continue; - - /* - * The order of partition OIDs in append_rel_list is the same as - * the order in the PartitionDesc, so the order of part_rels will - * also match the PartitionDesc. See expand_partitioned_rtentry. - */ - Assert(cnt_parts < nparts); - rel->part_rels[cnt_parts] = childrel; - cnt_parts++; + (void) build_simple_rel(root, appinfo->child_relid, rel); } - - /* We should have seen all the child partitions. */ - Assert(cnt_parts == nparts); } return rel; @@ -1767,4 +1760,135 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel, joinrel->partexprs[cnt] = partexpr; joinrel->nullable_partexprs[cnt] = nullable_partexpr; } + + /* Partitions will be added by try_partitionwise_join. */ + joinrel->live_parts = NULL; +} + +/* + * build_dummy_partition_rel + * Build a RelOptInfo and AppendRelInfo for a pruned partition + * + * This does not result in opening the relation or a range table entry being + * created. Also, the RelOptInfo thus created is not stored anywhere else + * beside the parent's part_rels array. + * + * The only reason this exists is because partition-wise join, in some cases, + * needs a RelOptInfo to represent an empty relation that's on the nullable + * side of an outer join, so that a Path representing the outer join can be + * created. + */ +RelOptInfo * +build_dummy_partition_rel(PlannerInfo *root, RelOptInfo *parent, int partidx) +{ + RelOptInfo *rel; + + Assert(parent->part_rels[partidx] == NULL); + + /* Create minimally valid-looking RelOptInfo with parent's relid. */ + rel = makeNode(RelOptInfo); + rel->reloptkind = RELOPT_OTHER_MEMBER_REL; + rel->relid = parent->relid; + rel->relids = bms_copy(parent->relids); + if (parent->top_parent_relids) + rel->top_parent_relids = parent->top_parent_relids; + else + rel->top_parent_relids = bms_copy(parent->relids); + rel->reltarget = copy_pathtarget(parent->reltarget); + parent->part_rels[partidx] = rel; + mark_dummy_rel(rel); + + /* + * Now we'll need a (noop) AppendRelInfo for parent, because we're setting + * the dummy partition's relid to be same as the parent's. + */ + if (root->append_rel_array[parent->relid] == NULL) + { + AppendRelInfo *appinfo = makeNode(AppendRelInfo); + + appinfo->parent_relid = parent->relid; + appinfo->child_relid = parent->relid; + appinfo->parent_reltype = parent->reltype; + appinfo->child_reltype = parent->reltype; + /* leaving translated_vars to NIL to mean no translation needed */ + appinfo->parent_reloid = root->simple_rte_array[parent->relid]->relid; + root->append_rel_array[parent->relid] = appinfo; + } + + return rel; +} + +/* + * build_partition_rel + * This adds a valid partition to the query by adding it to the + * range table and creating planner data structures for it + */ +RelOptInfo * +build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) +{ + RangeTblEntry *parentrte = root->simple_rte_array[parent->relid]; + RelOptInfo *result; + Index partRTindex = 0; + RangeTblEntry *partrte = NULL; + AppendRelInfo *appinfo = NULL; + PlanRowMark *rootrc = NULL; + + /* Locate the root partitioned table and fetch its PlanRowMark, if any. */ + if (root->rowMarks) + { + Index rootRTindex = 0; + + /* + * The root partitioned table itself might be a child of UNION ALL + * parent, so we must resort to finding the root parent like this. + */ + rootRTindex = parent->relid; + if (root->append_rel_array[rootRTindex]) + { + AppendRelInfo *tmp = root->append_rel_array[rootRTindex]; + + /* + * Keep moving up until we each the parent rel that's not a + * partitioned table. The one before that one would be the root + * parent. + */ + while(root->simple_rel_array[rootRTindex]->part_scheme) + { + tmp = root->append_rel_array[tmp->parent_relid]; + if (tmp == NULL) + break; + rootRTindex = tmp->parent_relid; + } + } + + rootrc = get_plan_rowmark(root->rowMarks, rootRTindex); + } + + /* + * expand_inherited_rtentry alreay locked all partitions, so pass + * NoLock for lockmode. + */ + add_inheritance_child_to_query(root, parentrte, parent->relid, + parent->reltype, parent->tupdesc, + rootrc, partoid, NoLock, + &appinfo, &partrte, &partRTindex); + + /* Partition turned out to be a partitioned table with 0 partitions. */ + if (partrte == NULL) + return NULL; + + Assert(appinfo != NULL); + root->append_rel_list = lappend(root->append_rel_list, appinfo); + root->simple_rte_array[partRTindex] = partrte; + root->append_rel_array[partRTindex] = appinfo; + + /* Build the RelOptInfo. */ + result = build_simple_rel(root, partRTindex, parent); + + /* Set the information created by create_lateral_join_info(). */ + result->direct_lateral_relids = parent->direct_lateral_relids; + result->lateral_relids = parent->lateral_relids; + result->lateral_referencers = parent->lateral_referencers; + + return result; } diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index b5c1c7d4dd..b2b76f5af3 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -45,7 +45,9 @@ #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" +#include "optimizer/cost.h" #include "optimizer/pathnode.h" +#include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/predtest.h" #include "optimizer/prep.h" @@ -437,26 +439,26 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, * is, not pruned already). */ subplan_map = (int *) palloc(nparts * sizeof(int)); + memset(subplan_map, -1, nparts * sizeof(int)); subpart_map = (int *) palloc(nparts * sizeof(int)); - present_parts = NULL; + memset(subpart_map, -1, nparts * sizeof(int)); + Assert(IS_SIMPLE_REL(subpart)); + present_parts = bms_copy(subpart->live_parts); - for (i = 0; i < nparts; i++) + i = -1; + while ((i = bms_next_member(present_parts, i)) >= 0) { RelOptInfo *partrel = subpart->part_rels[i]; - int subplanidx = relid_subplan_map[partrel->relid] - 1; - int subpartidx = relid_subpart_map[partrel->relid] - 1; + int subplanidx; + int subpartidx; + subplanidx = relid_subplan_map[partrel->relid] - 1; + subpartidx = relid_subpart_map[partrel->relid] - 1; subplan_map[i] = subplanidx; subpart_map[i] = subpartidx; + /* Record finding this subplan */ if (subplanidx >= 0) - { - present_parts = bms_add_member(present_parts, i); - - /* Record finding this subplan */ subplansfound = bms_add_member(subplansfound, subplanidx); - } - else if (subpartidx >= 0) - present_parts = bms_add_member(present_parts, i); } rte = root->simple_rte_array[subpart->relid]; @@ -548,61 +550,68 @@ gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory) * * Callers must ensure that 'rel' is a partitioned table. */ -Relids -prune_append_rel_partitions(RelOptInfo *rel) +void +prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel) { - Relids result; List *clauses = rel->baserestrictinfo; List *pruning_steps; - bool contradictory; + bool contradictory, + scan_all_parts = false; PartitionPruneContext context; - Bitmapset *partindexes; - int i; + Bitmapset *partindexes = NULL; - Assert(clauses != NIL); Assert(rel->part_scheme != NULL); /* If there are no partitions, return the empty set */ if (rel->nparts == 0) - return NULL; + return; - /* - * Process clauses. If the clauses are found to be contradictory, we can - * return the empty set. - */ - pruning_steps = gen_partprune_steps(rel, clauses, &contradictory); - if (contradictory) - return NULL; - - /* Set up PartitionPruneContext */ - context.strategy = rel->part_scheme->strategy; - context.partnatts = rel->part_scheme->partnatts; - context.nparts = rel->nparts; - context.boundinfo = rel->boundinfo; - context.partcollation = rel->part_scheme->partcollation; - context.partsupfunc = rel->part_scheme->partsupfunc; - context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) * + if (enable_partition_pruning && clauses != NIL) + { + /* + * Process clauses. If the clauses are found to be contradictory, we + * can return the empty set. + */ + pruning_steps = gen_partprune_steps(rel, clauses, &contradictory); + if (!contradictory) + { + context.strategy = rel->part_scheme->strategy; + context.partnatts = rel->part_scheme->partnatts; + context.nparts = rel->nparts; + context.boundinfo = rel->boundinfo; + context.partcollation = rel->part_scheme->partcollation; + context.partsupfunc = rel->part_scheme->partsupfunc; + context.stepcmpfuncs = (FmgrInfo *) + palloc0(sizeof(FmgrInfo) * context.partnatts * list_length(pruning_steps)); - context.ppccontext = CurrentMemoryContext; + context.ppccontext = CurrentMemoryContext; - /* These are not valid when being called from the planner */ - context.partrel = NULL; - context.planstate = NULL; - context.exprstates = NULL; - context.exprhasexecparam = NULL; - context.evalexecparams = false; + /* These are not valid when being called from the planner */ + context.partrel = NULL; + context.planstate = NULL; + context.exprstates = NULL; + context.exprhasexecparam = NULL; + context.evalexecparams = false; - /* Actual pruning happens here. */ - partindexes = get_matching_partitions(&context, pruning_steps); + /* Actual pruning happens here. */ + partindexes = get_matching_partitions(&context, pruning_steps); - /* Add selected partitions' RT indexes to result. */ - i = -1; - result = NULL; - while ((i = bms_next_member(partindexes, i)) >= 0) - result = bms_add_member(result, rel->part_rels[i]->relid); + /* No need to add partitions if all were pruned. */ + if (bms_is_empty(partindexes)) + return; + } + else + scan_all_parts = true; + } + else + scan_all_parts = true; - return result; + /* + * Build selected partitions' range table entries, RelOptInfos, and + * AppendRelInfos. + */ + add_rel_partitions_to_query(root, rel, scan_all_parts, partindexes); } /* diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 41caf873fb..02c5bdc73f 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -15,6 +15,7 @@ #define RELATION_H #include "access/sdir.h" +#include "access/tupdesc.h" #include "fmgr.h" #include "lib/stringinfo.h" #include "nodes/params.h" @@ -695,11 +696,15 @@ typedef struct RelOptInfo int nparts; /* number of partitions */ struct PartitionBoundInfoData *boundinfo; /* Partition bounds */ List *partition_qual; /* partition constraint */ + Oid *part_oids; /* partition OIDs */ struct RelOptInfo **part_rels; /* Array of RelOptInfos of partitions, * stored in the same order of bounds */ + Bitmapset *live_parts; /* unpruned parts; NULL if all are live */ List **partexprs; /* Non-nullable partition key expressions. */ List **nullable_partexprs; /* Nullable partition key expressions. */ List *partitioned_child_rels; /* List of RT indexes. */ + TupleDesc tupdesc; + Oid reltype; } RelOptInfo; /* diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 7c5ff22650..4f567765a4 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -297,5 +297,11 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel, RelOptInfo *parent_joinrel, List *restrictlist, SpecialJoinInfo *sjinfo, JoinType jointype); +extern RelOptInfo *build_dummy_partition_rel(PlannerInfo *root, + RelOptInfo *parent, + int partidx); +extern RelOptInfo *build_partition_rel(PlannerInfo *root, + RelOptInfo *parent, + Oid partoid); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/plancat.h b/src/include/optimizer/plancat.h index 7d53cbbb87..edaf2a3b4f 100644 --- a/src/include/optimizer/plancat.h +++ b/src/include/optimizer/plancat.h @@ -26,7 +26,7 @@ extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook; extern void get_relation_info(PlannerInfo *root, Oid relationObjectId, - bool inhparent, RelOptInfo *rel); + bool inhparent, Bitmapset *updatedCols, RelOptInfo *rel); extern List *infer_arbiter_indexes(PlannerInfo *root); diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index c8ab0280d2..1916a33467 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -73,6 +73,9 @@ extern int from_collapse_limit; extern int join_collapse_limit; extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode); +extern void add_rel_partitions_to_query(PlannerInfo *root, RelOptInfo *rel, + bool scan_all_parts, + Bitmapset *partindexes); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed, bool create_new_ph); diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h index 38608770a2..ca66f75544 100644 --- a/src/include/optimizer/prep.h +++ b/src/include/optimizer/prep.h @@ -49,6 +49,16 @@ extern RelOptInfo *plan_set_operations(PlannerInfo *root); extern void expand_inherited_tables(PlannerInfo *root); +extern void add_inheritance_child_to_query(PlannerInfo *root, + RangeTblEntry *parentrte, + Index parentRTindex, Oid parentRelType, + TupleDesc parentDesc, + PlanRowMark *top_parentrc, + Oid childOID, int lockmode, + AppendRelInfo **appinfo_p, + RangeTblEntry **childrte_p, + Index *childRTindex_p); + extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos, AppendRelInfo **appinfos); diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h index b95c346bab..55a324583b 100644 --- a/src/include/partitioning/partprune.h +++ b/src/include/partitioning/partprune.h @@ -79,7 +79,7 @@ extern PartitionPruneInfo *make_partition_pruneinfo(PlannerInfo *root, List *subpaths, List *partitioned_rels, List *prunequal); -extern Relids prune_append_rel_partitions(RelOptInfo *rel); +extern void prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel); extern Bitmapset *get_matching_partitions(PartitionPruneContext *context, List *pruning_steps); diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index dc6262be43..5f931591a6 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -5533,29 +5533,29 @@ select t1.b, ss.phv from join_ut1 t1 left join lateral (select t2.a as t2a, t3.a t3a, least(t1.a, t2.a, t3.a) phv from join_pt1 t2 join join_ut1 t3 on t2.a = t3.b) ss on t1.a = ss.t2a order by t1.a; - QUERY PLAN ------------------------------------------------------------------- + QUERY PLAN +-------------------------------------------------------------------- Sort - Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a + Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a Sort Key: t1.a -> Nested Loop Left Join - Output: t1.b, (LEAST(t1.a, t2.a, t3.a)), t1.a + Output: t1.b, (LEAST(t1.a, t2_1.a, t3.a)), t1.a -> Seq Scan on public.join_ut1 t1 Output: t1.a, t1.b, t1.c -> Hash Join - Output: t2.a, LEAST(t1.a, t2.a, t3.a) - Hash Cond: (t3.b = t2.a) + Output: t2_1.a, LEAST(t1.a, t2_1.a, t3.a) + Hash Cond: (t3.b = t2_1.a) -> Seq Scan on public.join_ut1 t3 Output: t3.a, t3.b, t3.c -> Hash - Output: t2.a + Output: t2_1.a -> Append - -> Seq Scan on public.join_pt1p1p1 t2 - Output: t2.a - Filter: (t1.a = t2.a) - -> Seq Scan on public.join_pt1p2 t2_1 + -> Seq Scan on public.join_pt1p1p1 t2_1 Output: t2_1.a Filter: (t1.a = t2_1.a) + -> Seq Scan on public.join_pt1p2 t2 + Output: t2.a + Filter: (t1.a = t2.a) (21 rows) select t1.b, ss.phv from join_ut1 t1 left join lateral diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out index d286050c9a..d1ce6ad423 100644 --- a/src/test/regress/expected/partition_aggregate.out +++ b/src/test/regress/expected/partition_aggregate.out @@ -144,7 +144,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c; QUERY PLAN -------------------------------- HashAggregate - Group Key: pagg_tab.c + Group Key: c -> Result One-Time Filter: false (4 rows) @@ -159,7 +159,7 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c; QUERY PLAN -------------------------------- GroupAggregate - Group Key: pagg_tab.c + Group Key: c -> Result One-Time Filter: false (4 rows) -- 2.11.0
From c768ffe75d37df9d12775a3477cab1b0f1314eb1 Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Thu, 23 Aug 2018 17:30:18 +0900 Subject: [PATCH v2 3/3] Only lock partitions that will be scanned by a query --- src/backend/optimizer/prep/prepunion.c | 8 +++----- src/backend/optimizer/util/relnode.c | 17 ++++++++++------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 279f686fb0..6a2adb5f4d 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -1555,14 +1555,15 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) lockmode = AccessShareLock; /* Scan for all members of inheritance set, acquire needed locks */ - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); + if (rte->relkind != RELKIND_PARTITIONED_TABLE) + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL); /* * Check that there's at least one descendant, else treat as no-child * case. This could happen despite above has_subclass() check, if table * once had a child but no longer does. */ - if (list_length(inhOIDs) < 2) + if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 2) { /* Clear flag before returning */ rte->inh = false; @@ -1579,10 +1580,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti) /* Partitioned tables are expanded elsewhere. */ if (rte->relkind == RELKIND_PARTITIONED_TABLE) - { - list_free(inhOIDs); return; - } /* * Must open the parent relation to examine its tupdesc. We need not lock diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 46ecb52166..0a0d7bcd26 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -1828,16 +1828,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) { RangeTblEntry *parentrte = root->simple_rte_array[parent->relid]; RelOptInfo *result; + Index rootRTindex = 0; Index partRTindex = 0; RangeTblEntry *partrte = NULL; AppendRelInfo *appinfo = NULL; PlanRowMark *rootrc = NULL; + int lockmode; /* Locate the root partitioned table and fetch its PlanRowMark, if any. */ if (root->rowMarks) { - Index rootRTindex = 0; - /* * The root partitioned table itself might be a child of UNION ALL * parent, so we must resort to finding the root parent like this. @@ -1864,13 +1864,16 @@ build_partition_rel(PlannerInfo *root, RelOptInfo *parent, Oid partoid) rootrc = get_plan_rowmark(root->rowMarks, rootRTindex); } - /* - * expand_inherited_rtentry alreay locked all partitions, so pass - * NoLock for lockmode. - */ + /* Determine the correct lockmode to use. */ + if (rootRTindex == root->parse->resultRelation) + lockmode = RowExclusiveLock; + else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType)) + lockmode = RowShareLock; + else + lockmode = AccessShareLock; add_inheritance_child_to_query(root, parentrte, parent->relid, parent->reltype, parent->tupdesc, - rootrc, partoid, NoLock, + rootrc, partoid, lockmode, &appinfo, &partrte, &partRTindex); /* Partition turned out to be a partitioned table with 0 partitions. */ -- 2.11.0