Thanks Amit. Looking at the latest v25 patch.
On 2017/11/16 23:50, Amit Khandekar wrote: > Below has the responses for both Amit's and David's comments, starting > with Amit's .... > On 2 November 2017 at 12:40, Amit Langote <langote_amit...@lab.ntt.co.jp> > wrote: >> On 2017/10/24 0:15, Amit Khandekar wrote: >>> On 16 October 2017 at 08:28, Amit Langote <langote_amit...@lab.ntt.co.jp> >>> wrote: >>>> >>>> + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup >>>> == NULL)))) >>>> >>>> Is there some reason why a bitwise operator is used here? >>> >>> That exact condition means that the function is called for transition >>> capture for updated rows being moved to another partition. For this >>> scenario, either the oldtup or the newtup is NULL. I wanted to exactly >>> capture that condition there. I think the bitwise operator is more >>> user-friendly in emphasizing the point that it is indeed an "either a >>> or b, not both" condition. >> >> I see. In that case, since this patch adds the new condition, a note >> about it in the comment just above would be good, because the situation >> you describe here seems to arise only during update-tuple-routing, IIUC. > > Done. Please check. Looks fine. >> + * 'update_rri' has the UPDATE per-subplan result rels. These are re-used >> + * instead of allocating new ones while generating the array of all >> leaf >> + * partition result rels. >> >> Instead of: >> >> "These are re-used instead of allocating new ones while generating the >> array of all leaf partition result rels." >> >> how about: >> >> "There is no need to allocate a new ResultRellInfo entry for leaf >> partitions for which one already exists in this array" > > Ok. I have made it like this : > > + * 'update_rri' contains the UPDATE per-subplan result rels. For the > output param > + * 'partitions', we don't allocate new ResultRelInfo objects for > + * leaf partitions for which they are already available > in 'update_rri'. Sure. >>> It looks like the interface does not much simplify, and above that, we >>> have more number of lines in that function. Also, the caller anyway >>> has to be aware whether the map_index is the index into the leaf >>> partitions or the update subplans. So it is not like the caller does >>> not have to be aware about whether the mapping should be >>> mt_persubplan_childparent_maps or mt_perleaf_parentchild_maps. >> >> Hmm, I think we should try to make it so that the caller doesn't have to >> be aware of that. And by caller I guess you mean ExecInsert(), which >> should not be a place, IMHO, where to try to introduce a lot of new logic >> specific to update tuple routing. > > I think, for ExecInsert() since we have already given the job of > routing the tuple from root partitioned table to a partition, it makes > sense to give the function the additional job of routing the tuple > from any partition to any partition. ExecInsert() can be looked at as > doing this job : "insert a tuple into the right partition; the > original tuple can belong to any partition" Yeah, that's one way of looking at that. But I think ExecInsert() as it is today thinks it's got a *new* tuple and that's it. I think the newly introduced code in it to find out that it is not so (that the tuple actually comes from some other partition), that it's really the update-turned-into-delete-plus-insert, and then switch to the root partitioned table's ResultRelInfo, etc. really belongs outside of it. Maybe in its caller, which is ExecUpdate(). I mean why not add this code to the block in ExecUpdate() that handles update-row-movement. Just before calling ExecInsert() to do the re-routing seems like a good place to do all that. For example, try the attached incremental patch that applies on top of yours. I can see after applying it that diffs to ExecInsert() are now just some refactoring ones and there are no significant additions making it look like supporting update-row-movement required substantial changes to how ExecInsert() itself works. > After doing the changes for the int[] array map in the previous patch > version, I still feel that ConvertPartitionTupleSlot() should be > retained. We save some repeated lines of code saved. OK. >> You may be right, but I see for WithCheckOptions initialization >> specifically that the non-tuple-routing code passes the actual sub-plan >> when initializing the WCO for a given result rel. > > Yes that's true. The problem with WithCheckOptions for newly allocated > partition result rels is : we can't use a subplan for the parent > parameter because there is no subplan for it. But I will still think > on it a bit more (TODO). Alright. >>> I think you are suggesting we do it like how it's done in >>> is_partition_attr(). Can you please let me know other places we do >>> this same way ? I couldn't find. >> >> OK, not as many as I thought there would be, but there are following >> beside is_partition_attrs(): >> >> partition.c: get_range_nulltest() >> partition.c: get_qual_for_range() >> relcache.c: RelationBuildPartitionKey() >> > > Ok, I think I will first address Robert's suggestion of re-using > is_partition_attrs() for pull_child_partition_columns(). If I do that, > this discussion won't be applicable, so I am deferring this one. > (TODO) Sure, no problem. Thanks, Amit
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index a0d8259663..09d16f4509 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -327,37 +327,6 @@ ExecInsert(ModifyTableState *mtstate, if (mtstate->mt_partition_dispatch_info) { int leaf_part_index; - ResultRelInfo *rootResultRelInfo; - - /* - * For UPDATE, the resultRelInfo is not the root partitioned table, so - * we should convert the tuple into root's tuple descriptor, since - * ExecFindPartition() starts the search from root. The tuple - * conversion map list is in the order of mtstate->resultRelInfo[], so - * to retrieve the one for this resultRel, we need to know the position - * of the resultRel in mtstate->resultRelInfo[]. - */ - if (mtstate->operation == CMD_UPDATE) - { - int map_index = resultRelInfo - mtstate->resultRelInfo; - TupleConversionMap *tupconv_map; - - Assert(mtstate->rootResultRelInfo != NULL); - rootResultRelInfo = mtstate->rootResultRelInfo; - - /* resultRelInfo must be one of the per-subplan result rels. */ - Assert(resultRelInfo >= mtstate->resultRelInfo && - resultRelInfo <= mtstate->resultRelInfo + mtstate->mt_nplans - 1); - - tupconv_map = tupconv_map_for_subplan(mtstate, map_index); - tuple = ConvertPartitionTupleSlot(mtstate, - tupconv_map, - tuple, - mtstate->mt_root_tuple_slot, - &slot); - } - else - rootResultRelInfo = resultRelInfo; /* * Away we go ... If we end up not finding a partition after all, @@ -367,7 +336,7 @@ ExecInsert(ModifyTableState *mtstate, * the ResultRelInfo and TupleConversionMap for the partition, * respectively. */ - leaf_part_index = ExecFindPartition(rootResultRelInfo, + leaf_part_index = ExecFindPartition(resultRelInfo, mtstate->mt_partition_dispatch_info, slot, estate); @@ -1178,6 +1147,8 @@ lreplace:; { bool tuple_deleted; TupleTableSlot *ret_slot; + TupleConversionMap *tupconv_map; + int subplan_off; /* * When an UPDATE is run with a leaf partition, we would not have @@ -1227,8 +1198,30 @@ lreplace:; if (mtstate->mt_transition_capture) saved_tcs_map = mtstate->mt_transition_capture->tcs_map; + + /* + * For UPDATE, the resultRelInfo is not the root partitioned + * table, so we should convert the tuple into root's tuple + * descriptor, since ExecInsert() starts the search from root. + */ + subplan_off = resultRelInfo - mtstate->resultRelInfo; + tupconv_map = tupconv_map_for_subplan(mtstate, subplan_off); + tuple = ConvertPartitionTupleSlot(mtstate, + tupconv_map, + tuple, + mtstate->mt_root_tuple_slot, + &slot); + + /* + * Make it look like to ExecInsert() that we are inserting the + * tuple into the root table. + */ + Assert(mtstate->rootResultRelInfo != NULL); + estate->es_result_relation_info = mtstate->rootResultRelInfo; ret_slot = ExecInsert(mtstate, slot, planSlot, NULL, ONCONFLICT_NONE, estate, canSetTag); + /* Restore for the next tuple. */ + estate->es_result_relation_info = resultRelInfo; if (mtstate->mt_transition_capture) {