Hi, I noticed that there is (another) oddity in commit aa09cd242f: in the !fpinfo->use_remote_estimate mode, when first called for costing an unsorted foreign scan, estimate_path_cost_size() computes retrieved_rows, which is the estimated number of rows fetched from the remote server, by these:
retrieved_rows = clamp_row_est(rows / fpinfo->local_conds_sel) retrieved_rows = Min(retrieved_rows, foreignrel->tuples); where rows is the estimated number of output rows emitted from the foreign scan, fpinfo->local_conds_sel is the selectivity of local conditions, and foreignrel->tuples is the foreign table's reltuples. This is good, BUT when next called for costing the presorted foreign scan, that function re-computes retrieved_rows by the former, but doesn't clamp it by the latter, which would produce wrong results. Here is such an example: create table t (a int, b int); create foreign table ft (a int, b int) server loopback options (table_name 't');insert into ft values (1, 10); insert into ft values (2, 20); analyze ft; create function postgres_fdw_abs(int) returns int as $$ begin return abs($1); end $$ language plpgsql immutable; explain verbose select * from ft where postgres_fdw_abs(b) > 10 order by a; QUERY PLAN ------------------------------------------------------------------- Foreign Scan on public.ft (cost=100.00..101.89 rows=1 width=8) Output: a, b Filter: (postgres_fdw_abs(ft.b) > 10) Remote SQL: SELECT a, b FROM public.t ORDER BY a ASC NULLS LAST (4 rows) For this query, we have rows=1 and foreignrel->tuples=2. postgres_fdw_abs(b) > 10 is a local condition, for which we have fpinfo->local_conds_sel=0.333333 (I got this by printf debugging). So when first called for costing an unsorted foreign scan, by the former equation retrieved_rows=3, then by the latter retrieved_rows=2, which is correct. BUT when next called for costing the presorted foreign scan, we have retrieved_rows=3, as that function doesn't clamp the retrieved_rows. This is wrong, leading to incorrect cost estimates. (This is an issue for the foreign-scan case, but I think we would have the same issue for the foreign-join case.) To fix, I propose to handle retrieved_rows in the same way as cached costs; 1) cache retrieved_rows computed in the first call of estimate_path_cost_size() into the foreign table's fpinfo, and 2) use it after the first call. Also, I'd like to propose to put this code in that function for !use_remote_estimate mode in each of the below code for the cases of foreign scan, foreign join, and foreign grouping as needed, and use the rows/width estimates stored in the fpinfo (ie, fpinfo->rows and fpinfo->width) after the first call, like the attached. /* * Use rows/width estimates made by set_baserel_size_estimates() for * base foreign relations and set_joinrel_size_estimates() for join * between foreign relations. */ rows = foreignrel->rows; width = foreignrel->reltarget->width; I think that that would make the code more consistent and easier to understand. Also, there is another two reasons: a) this code seems confusing to me for the foreign-grouping case, as the core code doesn't set foreignrel->rows at all for grouped relations. The change proposed above would avoid that confusion. And b) we can remove a change made by commit ffab494a4d, which added support for sorting grouped relations remotely in postgres_fdw. In that commit, to extend the logic for re-using cached costs to the foreign-grouping case, I modified add_foreign_grouping_paths() so that it saves the rows estimate for a grouped relation made by estimate_path_cost_size() into the grouped relation's foreignrel->rows. But for grouped relations, we already save the row/width estimates into fpinfo->rows and fpinfo->width, so the change proposed above would make that change unnecessary. Other change is: I noticed that commit 7012b132d0 incorrectly re-sets the width estimates for grouped relations in the !fpinfo->use_remote_estimate mode, so I fixed that as well in the attached. Best regards, Etsuro Fujita
fix-estimate_path_cost_size.patch
Description: Binary data