On 1 June 2017 at 03:25, Robert Haas <robertmh...@gmail.com> wrote: > Greg/Amit's idea of using the CTID field rather than an infomask bit > seems like a possibly promising approach. Not everything that needs > bit-space can use the CTID field, so using it is a little less likely > to conflict with something else we want to do in the future than using > a precious infomask bit. However, I'm worried about this: > > /* Make sure there is no forward chain link in t_ctid */ > tp.t_data->t_ctid = tp.t_self; > > The comment does not say *why* we need to make sure that there is no > forward chain link, but it implies that some code somewhere in the > system does or at one time did depend on no forward link existing. > Any such code that still exists will need to be updated. Anybody know > what code that might be, exactly?
I am going to have a look overall at this approach, and about code somewhere else which might be assuming that t_ctid cannot be Invalid. > Regarding the trigger issue, I can't claim to have a terribly strong > opinion on this. I think that practically anything we do here might > upset somebody, but probably any halfway-reasonable thing we choose to > do will be OK for most people. However, there seems to be a > discrepancy between the approach that got the most votes and the one > that is implemented by the v8 patch, so that seems like something to > fix. Yes, I have started working on updating the patch to use that approach (BR and AR update triggers on source and destination partition respectively, instead of delete+insert) The approach taken by the patch (BR update + delete+insert triggers) didn't require any changes in the way ExecDelete() and ExecInsert() were called. Now we would require to skip the delete/insert triggers, so some flags need to be passed to these functions, or else have stripped down versions of ExecDelete() and ExecInsert() which don't do other things like RETURNING handling and firing triggers. > > For what it's worth, in the future, I imagine that we might allow > adding a trigger to a partitioned table and having that cascade down > to all descendant tables. In that world, firing the BR UPDATE trigger > for the old partition and the AR UPDATE trigger for the new partition > will look a lot like the behavior the user would expect on an > unpartitioned table, which could be viewed as a good thing. On the > other hand, it's still going to be a DELETE+INSERT under the hood for > the foreseeable future, so firing the delete triggers and then the > insert triggers is also defensible. Ok, I was assuming that there won't be any plans to support triggers on a partitioned table, but yes, I had imagined how the behaviour would be in this world. Currently, users who want to have triggers on a table that happens to be a partitioned table, have to install the same trigger on each of the leaf partitions, since there is no other choice. But we would never know whether a trigger on a leaf partition was actually meant to be specifically on that individual partition or it was actually meant to be a trigger on a root partitioned table. Hence there is the difficulty of deciding the right behaviour in case of triggers with row movement. If we have an AR UPDATE trigger on root table, then during row movement, it does not matter whether we fire the trigger on source or destination, because it is the same single trigger cascaded on both the partitions. If there is a trigger installed specifically on a leaf partition, then we know that it should not be fired on other partitions since it is specifically made for this one. And same applies for delete and insert triggers: If installed on parent, don't involve them in row-movement; only fire them if installed on leaf partitions regardless of whether it was an internally generated delete+insert due to row-movement). Similarly we can think about BR triggers. Of courses, DBAs should be aware of triggers that are already installed in the table ancestors before installing a new one on a child table. Overall, it becomes much clearer what to do if we decide to allow triggers on partitioned tables. > Is there any big difference between these appraoches in terms > of how much code is required to make this work? You mean if we allow triggers on partitioned tables ? I think we would have to keep some flag in the trigger data (or somewhere else) that the trigger actually belongs to upper partitioned table, and so for delete+insert, don't fire such trigger. Other than that, we don't have to decide in any unique way which trigger to fire on which table. > > In terms of the approach taken by the patch itself, it seems > surprising to me that the patch only calls > ExecSetupPartitionTupleRouting when an update fails the partition > constraint. Note that in the insert case, we call that function at > the start of execution; > calling it in the middle seems to involve additional hazards; > for example, is it really safe to add additional > ResultRelInfos midway through the operation? I thought since the additional ResultRelInfos go into mtstate->mt_partitions which is independent of estate->es_result_relations, that should be safe. > Is it safe to take more locks midway through the operation? I can imagine some rows already updated, when other tasks like ALTER TABLE or CREATE INDEX happen on other partitions which are still unlocked, and then for row movement we try to lock these other partitions and wait for the DDL tasks to complete. But I didn't see any particular issues with that. But correct me if you suspect a possible issue. One issue can be if we were able to modify the table attributes, but I believe we cannot do that for inherited columns. > It seems like it might be a lot > safer to decide at the beginning of the operation whether this is > needed -- we can skip it if none of the columns involved in the > partition key (or partition key expressions) are mentioned in the > update. > (There's also the issue of triggers, The reason I thought it cannot be done at the start of the execution, is because even if we know that update is not modifying the partition key column, we are not certain that the final NEW row has its partition key column unchanged, because of triggers. I understand it might be weird for a user requiring to modify a partition key value, but if a user does that, it will result in crash because we won't have the partition routing setup, thinking that there is no partition key column in the UPDATE. And we also cannot unconditionally setup the partition routing on all updates, for performance reasons. > I'm not sure that it's sensible to allow a trigger on an > individual partition to reroute an update to another partition > what if we get an infinite loop?) You mean, if the other table has another trigger that will again route to the original partition ? But this infinite loop problem could occur even for 2 normal tables ? > > + if (concurrently_deleted) > + return NULL; > > I don't understand the motivation for this change, and there are no > comments explaining it that I can see. Yeah comments, I think, are missing. I thought in the ExecDelete() they are there, but they are not. If a concurrent delete already deleted the row, we should not bother about moving the row, hence the above code. > Perhaps the concurrency-related (i.e. EPQ) behavior here could be > tested via the isolation tester. WIll check. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers