On 30 August 2017 at 17:32, Amit Khandekar <amitdkhan...@gmail.com> wrote: > On 16 August 2017 at 18:34, Robert Haas <robertmh...@gmail.com> wrote: >> Thanks for the benchmarking results! >> >> On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih >> <rafia.sa...@enterprisedb.com> wrote: >>> Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41 >> >> 12 seconds instead of 244? Whoa. I find it curious that we picked a >> Parallel Append with a bunch of non-partial plans when we could've >> just as easily picked partial plans, or so it seems to me. To put >> that another way, why did we end up with a bunch of Bitmap Heap Scans >> here instead of Parallel Bitmap Heap Scans? > > Actually, the cost difference would be quite low for Parallel Append > with partial plans and Parallel Append with non-partial plans with 2 > workers. But yes, I should take a look at why it is consistently > taking non-partial Bitmap Heap Scan.
Here, I checked that Partial Bitmap Heap Scan Path is not getting created in the first place; but I think it should. As you can see from the below plan snippet, the inner path of the join is a parameterized Index Scan : -> Parallel Append -> Nested Loop Semi Join -> Bitmap Heap Scan on orders_004 Recheck Cond: ((o_orderdate >= '1994-01-01'::date) AND (o_orderdate < '1994-04-01 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on idx_orders_orderdate_004 Index Cond: ((o_orderdate >= '1994-01-01'::date) AND (o_orderdate < '1994-04-01 00:00:00'::timestamp without time zone)) -> Index Scan using idx_lineitem_orderkey_004 on lineitem_004 Index Cond: (l_orderkey = orders_004.o_orderkey) Filter: (l_commitdate < l_receiptdate) In the index condition of the inner IndexScan path, it is referencing partition order_004 which is used by the outer path. So this should satisfy the partial join path restriction concerning parameterized inner path : "inner path should not refer to relations *outside* the join path". Here, it is referring to relations *inside* the join path. But still this join path gets rejected by try_partial_nestloop_path(), here : if (inner_path->param_info != NULL) { Relids inner_paramrels = inner_path->param_info->ppi_req_outer; if (!bms_is_subset(inner_paramrels, outer_path->parent->relids)) return; } Actually, bms_is_subset() above should return true, because inner_paramrels and outer_path relids should have orders_004. But that's not happening. inner_paramrels is referring to orders, not orders_004. And hence bms_is_subset() returns false (thereby rejecting the partial nestloop path). I suspect this is because the innerpath is not getting reparameterized so as to refer to child relations. In the PWJ patch, I saw that reparameterize_path_by_child() is called by try_nestloop_path(), but not by try_partial_nestloop_path(). Now, for Parallel Append, if this partial nestloop subpath gets created, it may or may not get chosen, depending upon the number of workers. For e.g. if the number of workers is 6, and ParalleAppend+PWJ runs with only 2 partitions, then partial nestedloop join would definitely win because we can put all 6 workers to work, whereas for ParallelAppend with all non-partial nested loop join subpaths, at the most only 2 workers could be allotted, one for each child. But if the partitions are more, and available workers are less, then I think the cost difference in case of partial versus non-partial join paths would not be significant. But here the issue is, partial nest loop subpaths don't get created in the first place. Looking at the above analysis, this issue should be worked by a different thread, not in this one. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers