On Wed, Sep 13, 2017 at 12:51 PM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Sep 13, 2017 at 12:39 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Tue, Sep 12, 2017 at 3:46 AM, Amit Langote >> <langote_amit...@lab.ntt.co.jp> wrote: >>> In this case, AcquireExecutorLocks will lock all the relations in >>> PlannedStmt.rtable, which must include all partitioned tables of all >>> partition trees involved in the query. Of those, it will lock the tables >>> whose RT indexes appear in PlannedStmt.nonleafResultRelations with >>> RowExclusiveLock mode. PlannedStmt.nonleafResultRelations is a global >>> list of all partitioned table RT indexes obtained by concatenating >>> partitioned_rels lists of all ModifyTable nodes involved in the query >>> (set_plan_refs does that). We need to distinguish nonleafResultRelations, >>> because we need to take the stronger lock on a given table before any >>> weaker one if it happens to appear in the query as a non-result relation >>> too, to avoid lock strength upgrade deadlock hazard. >> >> Hmm. The problem with this theory in my view is that it doesn't >> explain why InitPlan() and ExecOpenScanRelation() lock the relations >> instead of just assuming that they are already locked either by >> AcquireExecutorLocks or by planning. If ExecLockNonLeafAppendTables() >> doesn't really need to take locks, then ExecOpenScanRelation() must >> not need to do it either. We invented ExecLockNonLeafAppendTables() >> on the occasion of removing the scans of those tables which would >> previously have caused ExecOpenScanRelation() to be invoked, so as to >> keep the locking behavior unchanged. >> >> AcquireExecutorLocks() looks like an odd bit of code to me. The >> executor itself locks result tables in InitPlan() and then everything >> else during InitPlan() and all of the others later on while walking >> the plan tree -- comments in InitPlan() say that this is to avoid a >> lock upgrade hazard if a result rel is also a source rel. But >> AcquireExecutorLocks() has no such provision; it just locks everything >> in RTE order. In theory, that's a deadlock hazard of another kind, as >> we just talked about in the context of EIBO. In fact, expanding in >> bound order has made the situation worse: before, expansion order and >> locking order were the same, so maybe having AcquireExecutorLocks() >> work in RTE order coincidentally happened to give the same result as >> the executor code itself as long as there are no result relations. >> But this is certainly not true any more. I'm not sure it's worth >> expending a lot of time on this -- it's evidently not a problem in >> practice, or somebody probably would've complained before now. >> >> But that having been said, I don't think we should assume that all the >> locks taken from the executor are worthless because plancache.c will >> always do the job for us. I don't know of a case where we execute a >> saved plan without going through the plan cache, but that doesn't mean >> that there isn't one or that there couldn't be one in the future. >> It's not the job of these partitioning patches to whack around the way >> we do locking in general -- they should preserve the existing behavior >> as much as possible. If we want to get rid of the locking in the >> executor altogether, that's a separate discussion where, I have a >> feeling, there will prove to be better reasons for the way things are >> than we are right now supposing. >> > > I agree that it's not the job of these patches to change the locking > or even get rid of partitioned_rels. In order to continue returning > partitioned_rels in Append paths esp. in the case of queries involving > set operations and partitioned table e.g "select 1 from t1 union all > select 2 from t1;" in which t1 is multi-level partitioned table, we > need a fix in add_paths_to_append_rels(). The fix provided in [1] is > correct but we will need a longer explanation of why we have to > involve RTE_SUBQUERY with RELKIND_PARTITIONED_TABLE. The explanation > is complicated. If we get rid of partitioned_rels, we don't need to > fix that code in add_paths_to_append_rel(). > > I suggested that [2] > -- (excerpt from [2]) > > Actually, the original problem that caused this discussion started > with an assertion failure in get_partitioned_child_rels() as > Assert(list_length(result) >= 1); > > This assertion fails if result is NIL when an intermediate partitioned > table is passed. May be we should assert (result == NIL || > list_length(result) == 1) and allow that function to be called even > for intermediate partitioned partitions for which the function will > return NIL. That will leave the code in add_paths_to_append_rel() > simple. Thoughts? > -- > > Amit Langote agrees with this. It kind of makes the assertion lame but > keeps the code sane. What do you think?
I debugged what happens in case of query "select 1 from t1 union all select 2 from t1;" with the current HEAD (without multi-level expansion patch attached). It doesn't set partitioned_rels in Append path that gets converted into Append plan. Remember t1 is a multi-level partitioned table here with t1p1 as its immediate partition and t1p1p1 as partition of t1p1. So, the set_append_rel_pathlist() recurses once as shown in the following stack trace. #0 add_paths_to_append_rel (root=0x23e4308, rel=0x23fb768, live_childrels=0x23ff5f0) at allpaths.c:1281 #1 0x000000000076e170 in set_append_rel_pathlist (root=0x23e4308, rel=0x23fb768, rti=4, rte=0x23f3268) at allpaths.c:1262 #2 0x000000000076cf23 in set_rel_pathlist (root=0x23e4308, rel=0x23fb768, rti=4, rte=0x23f3268) at allpaths.c:431 #3 0x000000000076e0f6 in set_append_rel_pathlist (root=0x23e4308, rel=0x23fb478, rti=1, rte=0x2382070) at allpaths.c:1247 #4 0x000000000076cf23 in set_rel_pathlist (root=0x23e4308, rel=0x23fb478, rti=1, rte=0x2382070) at allpaths.c:431 #5 0x000000000076cc22 in set_base_rel_pathlists (root=0x23e4308) at allpaths.c:309 When add_paths_to_append_rel() (frame 0) is called for t1, it gets partitioned_rels and stuffs it in append path/s it creates. But those paths are flattened into the append paths created for the set operations when add_paths_to_append_rels() is called from frame 3. While flattening the append paths in accumulate_append_subpath() we do not pull any partitioned_rels that are stuffed in those paths and thus the final append path/s created does not have partitioned_rels in there. The same behaviour is retained by my v30 patchset [1]. I think we should go ahead by fixing add_paths_to_append_rel() as done in that patchset. partitioned_rels needs to be removed from append paths anyway, so that code will be removed when we do that. [1] https://www.postgresql.org/message-id/CAFjFpRfHkJW3G=_pnsuc6pbxje48awywyrzagqtfkzzou4w...@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers