Dear hackers,

I recently found a weird behaviour involving FDW (postgres_fdw) and
planning.

Here’s a simplified use-case:

Given a remote table (say on server2) with the following definition:

CREATE TABLE t1(
  ts timestamp without time zone,
  x  bigint,
  x2 text
);
--Then populate t1 table:INSERT INTO t1
       SELECT
        current_timestamp - 1000*random()*'1 day'::interval
        ,x
        ,''||x
       FROM
        generate_series(1,100000) as x;


This table is imported in a specific schema on server1 (we do not use
use_remote_estimate) also with t1 name in a specific schema:

On server1:

CREATE SERVER server2
       FOREIGN DATA WRAPPER  postgres_fdw
       OPTIONS (
              host '127.0.0.1',
              port '9002',
              dbname 'postgres',
              use_remote_estimate 'false'
       );
CREATE USER MAPPING FOR jc
       SERVER server2
       OPTIONS (user 'jc');
CREATE SCHEMA remote;

IMPORT FOREIGN SCHEMA public
       FROM SERVER server2
       INTO remote ;

On a classic PostgreSQL 15 version the following query using date_trunc()
is executed and results in the following plan:

jc=# explain (verbose,analyze) select date_trunc('day',ts), count(1)
from remote.t1 group by date_trunc('day',ts) order by 1;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=216.14..216.64 rows=200 width=16) (actual
time=116.699..116.727 rows=1001 loops=1)
   Output: (date_trunc('day'::text, ts)), (count(1))
   Sort Key: (date_trunc('day'::text, t1.ts))
   Sort Method: quicksort  Memory: 79kB
   ->  HashAggregate  (cost=206.00..208.50 rows=200 width=16) (actual
time=116.452..116.532 rows=1001 loops=1)
         Output: (date_trunc('day'::text, ts)), count(1)
         Group Key: date_trunc('day'::text, t1.ts)
         Batches: 1  Memory Usage: 209kB
         ->  Foreign Scan on remote.t1  (cost=100.00..193.20 rows=2560
width=8) (actual time=0.384..106.225 rows=100000 loops=1)
               Output: date_trunc('day'::text, ts)
               Remote SQL: SELECT ts FROM public.t1
 Planning Time: 0.077 ms
 Execution Time: 117.028 ms


Whereas the same query with date_bin()

jc=# explain (verbose,analyze) select
date_bin('1day',ts,'2023-01-01'), count(1) from remote.t1 group by 1
order by 1;

                                                 QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=113.44..164.17 rows=200 width=16) (actual
time=11.297..16.312 rows=1001 loops=1)
   Output: (date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone)), (count(1))
   Relations: Aggregate on (remote.t1)
   Remote SQL: SELECT date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone), count(1) FROM public.t1 GROUP
BY 1 ORDER BY date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone) ASC NULLS LAST
 Planning Time: 0.114 ms
 Execution Time: 16.599 ms



With date_bin() the whole expression is pushed down to the remote server,
whereas with date_trunc() it’s not.

I dived into the code and live debugged. It turns out that decisions to
pushdown or not a whole query depends on many factors like volatility and
collation. In the date_trunc() case, the problem is all about collation (
date_trunc() on timestamp without time zone). And decision is made in the
foreign_expr_walker() in deparse.c (
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/postgres_fdw/deparse.c;h=efaf387890e3f85c419748ec3af5d1e9696c9c4c;hb=86648dcdaec67b83cec20a9d25b45ec089a7c624#l468
)

First the function is tested as shippable (able to be pushed down) and
date_trunc() and date_bin() both are.

Then parameters sub-expressions are evaluated with collation and
“shippability”, and they all are with both functions.

Then we arrive at this code portion:

if (fe->inputcollid == InvalidOid)
  /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state !=
FDW_COLLATE_SAFE ||
             fe->inputcollid != inner_cxt.collation)
  return false;

For date_trunc() function :

   -

   fe variable contains the sub-expressions/arguments merged constraints
   such as fe->inputcollid. This field is evaluated to 100 (default
   collation) so codes jumps to else statement and evaluates the if
   predicates. This 100 inputcollationid is due to text predicate 'day'.
   -

   inner_cxt.state contains FDW_COLLATE_STATE but inner_cxt.collation
   contains 0 (InvalidOid) so the control flow returns false thus the
   function cannot be pushed down.

For date_bin() function :

   - fe variable contains the sub-expressions/arguments merged constraints.
   Here, fe->inputcollid is evaluated to 0 (InvalidOid) thus skips the else
   statement and continues the control flow in the function.

For date_bin(), all arguments are “non-collatable” arguments (timestamp
without time zone and interval).

So the situation is that date_trunc() is a “non-collatable” function
failing to be pushed down whereas it may be a good idea to do so.

Maybe we could add another condition to the first if statement in order to
allow a “no-collation” function to be pushed down even if they have
“collatable” parameters. I’m not sure about the possible regressions of
behaviour of this change, but it seems to work fine with date_trunc() and
date_part() (which suffers the same problem).

Here’s the following change

/*
* If function's input collation is not derived from a foreign
* Var, it can't be sent to remote.
*/if (fe->inputcollid == InvalidOid ||
        fe->funccollid == InvalidOid)
  /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state !=
FDW_COLLATE_SAFE ||
             fe->inputcollid != inner_cxt.collation)
     return false;

I don’t presume this patch is free from side effects or fits all use-cases.

A patch (tiny) is attached to this email. This patch works against
master/head at the time of writing.
Thank you for any thoughts.

-- 
Jean-Christophe Arnu
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09d6dd60dd..16b938ebb2 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -568,7 +568,8 @@ foreign_expr_walker(Node *node,
 				 * If function's input collation is not derived from a foreign
 				 * Var, it can't be sent to remote.
 				 */
-				if (fe->inputcollid == InvalidOid)
+				if (fe->inputcollid == InvalidOid ||
+					 fe->funccollid == InvalidOid)
 					 /* OK, inputs are all noncollatable */ ;
 				else if (inner_cxt.state != FDW_COLLATE_SAFE ||
 						 fe->inputcollid != inner_cxt.collation)

Reply via email to