Thanks Amit.
Looking at the latest v25 patch.
On 2017/11/16 23:50, Amit Khandekar wrote:
> Below has the responses for both Amit's and David's comments, starting
> with Amit's ....
> On 2 November 2017 at 12:40, Amit Langote <[email protected]>
> wrote:
>> On 2017/10/24 0:15, Amit Khandekar wrote:
>>> On 16 October 2017 at 08:28, Amit Langote <[email protected]>
>>> wrote:
>>>>
>>>> + (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup
>>>> == NULL))))
>>>>
>>>> Is there some reason why a bitwise operator is used here?
>>>
>>> That exact condition means that the function is called for transition
>>> capture for updated rows being moved to another partition. For this
>>> scenario, either the oldtup or the newtup is NULL. I wanted to exactly
>>> capture that condition there. I think the bitwise operator is more
>>> user-friendly in emphasizing the point that it is indeed an "either a
>>> or b, not both" condition.
>>
>> I see. In that case, since this patch adds the new condition, a note
>> about it in the comment just above would be good, because the situation
>> you describe here seems to arise only during update-tuple-routing, IIUC.
>
> Done. Please check.
Looks fine.
>> + * 'update_rri' has the UPDATE per-subplan result rels. These are re-used
>> + * instead of allocating new ones while generating the array of all
>> leaf
>> + * partition result rels.
>>
>> Instead of:
>>
>> "These are re-used instead of allocating new ones while generating the
>> array of all leaf partition result rels."
>>
>> how about:
>>
>> "There is no need to allocate a new ResultRellInfo entry for leaf
>> partitions for which one already exists in this array"
>
> Ok. I have made it like this :
>
> + * 'update_rri' contains the UPDATE per-subplan result rels. For the
> output param
> + * 'partitions', we don't allocate new ResultRelInfo objects for
> + * leaf partitions for which they are already available
> in 'update_rri'.
Sure.
>>> It looks like the interface does not much simplify, and above that, we
>>> have more number of lines in that function. Also, the caller anyway
>>> has to be aware whether the map_index is the index into the leaf
>>> partitions or the update subplans. So it is not like the caller does
>>> not have to be aware about whether the mapping should be
>>> mt_persubplan_childparent_maps or mt_perleaf_parentchild_maps.
>>
>> Hmm, I think we should try to make it so that the caller doesn't have to
>> be aware of that. And by caller I guess you mean ExecInsert(), which
>> should not be a place, IMHO, where to try to introduce a lot of new logic
>> specific to update tuple routing.
>
> I think, for ExecInsert() since we have already given the job of
> routing the tuple from root partitioned table to a partition, it makes
> sense to give the function the additional job of routing the tuple
> from any partition to any partition. ExecInsert() can be looked at as
> doing this job : "insert a tuple into the right partition; the
> original tuple can belong to any partition"
Yeah, that's one way of looking at that. But I think ExecInsert() as it
is today thinks it's got a *new* tuple and that's it. I think the newly
introduced code in it to find out that it is not so (that the tuple
actually comes from some other partition), that it's really the
update-turned-into-delete-plus-insert, and then switch to the root
partitioned table's ResultRelInfo, etc. really belongs outside of it.
Maybe in its caller, which is ExecUpdate(). I mean why not add this code
to the block in ExecUpdate() that handles update-row-movement.
Just before calling ExecInsert() to do the re-routing seems like a good
place to do all that. For example, try the attached incremental patch
that applies on top of yours. I can see after applying it that diffs to
ExecInsert() are now just some refactoring ones and there are no
significant additions making it look like supporting update-row-movement
required substantial changes to how ExecInsert() itself works.
> After doing the changes for the int[] array map in the previous patch
> version, I still feel that ConvertPartitionTupleSlot() should be
> retained. We save some repeated lines of code saved.
OK.
>> You may be right, but I see for WithCheckOptions initialization
>> specifically that the non-tuple-routing code passes the actual sub-plan
>> when initializing the WCO for a given result rel.
>
> Yes that's true. The problem with WithCheckOptions for newly allocated
> partition result rels is : we can't use a subplan for the parent
> parameter because there is no subplan for it. But I will still think
> on it a bit more (TODO).
Alright.
>>> I think you are suggesting we do it like how it's done in
>>> is_partition_attr(). Can you please let me know other places we do
>>> this same way ? I couldn't find.
>>
>> OK, not as many as I thought there would be, but there are following
>> beside is_partition_attrs():
>>
>> partition.c: get_range_nulltest()
>> partition.c: get_qual_for_range()
>> relcache.c: RelationBuildPartitionKey()
>>
>
> Ok, I think I will first address Robert's suggestion of re-using
> is_partition_attrs() for pull_child_partition_columns(). If I do that,
> this discussion won't be applicable, so I am deferring this one.
> (TODO)
Sure, no problem.
Thanks,
Amit
diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index a0d8259663..09d16f4509 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -327,37 +327,6 @@ ExecInsert(ModifyTableState *mtstate,
if (mtstate->mt_partition_dispatch_info)
{
int leaf_part_index;
- ResultRelInfo *rootResultRelInfo;
-
- /*
- * For UPDATE, the resultRelInfo is not the root partitioned
table, so
- * we should convert the tuple into root's tuple descriptor,
since
- * ExecFindPartition() starts the search from root. The tuple
- * conversion map list is in the order of
mtstate->resultRelInfo[], so
- * to retrieve the one for this resultRel, we need to know the
position
- * of the resultRel in mtstate->resultRelInfo[].
- */
- if (mtstate->operation == CMD_UPDATE)
- {
- int map_index = resultRelInfo -
mtstate->resultRelInfo;
- TupleConversionMap *tupconv_map;
-
- Assert(mtstate->rootResultRelInfo != NULL);
- rootResultRelInfo = mtstate->rootResultRelInfo;
-
- /* resultRelInfo must be one of the per-subplan result
rels. */
- Assert(resultRelInfo >= mtstate->resultRelInfo &&
- resultRelInfo <= mtstate->resultRelInfo +
mtstate->mt_nplans - 1);
-
- tupconv_map = tupconv_map_for_subplan(mtstate,
map_index);
- tuple = ConvertPartitionTupleSlot(mtstate,
-
tupconv_map,
-
tuple,
-
mtstate->mt_root_tuple_slot,
-
&slot);
- }
- else
- rootResultRelInfo = resultRelInfo;
/*
* Away we go ... If we end up not finding a partition after
all,
@@ -367,7 +336,7 @@ ExecInsert(ModifyTableState *mtstate,
* the ResultRelInfo and TupleConversionMap for the partition,
* respectively.
*/
- leaf_part_index = ExecFindPartition(rootResultRelInfo,
+ leaf_part_index = ExecFindPartition(resultRelInfo,
mtstate->mt_partition_dispatch_info,
slot,
estate);
@@ -1178,6 +1147,8 @@ lreplace:;
{
bool tuple_deleted;
TupleTableSlot *ret_slot;
+ TupleConversionMap *tupconv_map;
+ int subplan_off;
/*
* When an UPDATE is run with a leaf partition, we
would not have
@@ -1227,8 +1198,30 @@ lreplace:;
if (mtstate->mt_transition_capture)
saved_tcs_map =
mtstate->mt_transition_capture->tcs_map;
+
+ /*
+ * For UPDATE, the resultRelInfo is not the root
partitioned
+ * table, so we should convert the tuple into root's
tuple
+ * descriptor, since ExecInsert() starts the search
from root.
+ */
+ subplan_off = resultRelInfo - mtstate->resultRelInfo;
+ tupconv_map = tupconv_map_for_subplan(mtstate,
subplan_off);
+ tuple = ConvertPartitionTupleSlot(mtstate,
+
tupconv_map,
+
tuple,
+
mtstate->mt_root_tuple_slot,
+
&slot);
+
+ /*
+ * Make it look like to ExecInsert() that we are
inserting the
+ * tuple into the root table.
+ */
+ Assert(mtstate->rootResultRelInfo != NULL);
+ estate->es_result_relation_info =
mtstate->rootResultRelInfo;
ret_slot = ExecInsert(mtstate, slot, planSlot, NULL,
ONCONFLICT_NONE, estate, canSetTag);
+ /* Restore for the next tuple. */
+ estate->es_result_relation_info = resultRelInfo;
if (mtstate->mt_transition_capture)
{