Re: FDW pushdown of non-collated functions

2023-10-10 Thread Jean-Christophe Arnu
Hi Ashutosh,

Le ven. 6 oct. 2023 à 14:16, Ashutosh Bapat 
a écrit :

> Hi Jean-Christophe,
>
> On Fri, Sep 8, 2023 at 11:30 PM Jean-Christophe Arnu 
> wrote:
> >
> > Maybe we could add another condition to the first if statement in order
> to allow a “no-collation” function to be pushed down even if they have
> “collatable” parameters. I’m not sure about the possible regressions of
> behaviour of this change, but it
> seems to work fine with date_trunc() and date_part() (which suffers
> the same problem).
>
> That may not work since the output of the function may be dependent
> upon the collation on the inputs.
>
> There were similar discussions earlier. E.g.
>
> https://www.postgresql.org/message-id/flat/CACowWR1ARWyRepRxGfijMcsw%2BH84Dj8x2o9N3kvz%3Dz1p%2B6b45Q%40mail.gmail.com
> .
>
> Reading Tom's first reply there you may work around this by declaring
> the collation explicitly.
>

Thanks for your reply. I did not catch  these messages in the archive.
Thanks for spotting them.


> Briefly reading Tom's reply, the problem seems to be trusting whether
> the default collation locally and on the foreign server respectively
> is same or not. May be a simple fix is to declare a foreign server
> level option declaring that the default collation on the foreign
> server is same as the local server may be a way to move forward. But
> given that the problem remains unsolved for 7 years at least, may be
> such a simple fix is not enough.
>

I studied postgres_fdw source code a bit and the problem is not as easy to
solve : one could set an option telling the default remote collation is
aligned with local per "server" but nothing guaranties that the parameter
collation is known on the «remote» side.


>
> Another solution would be to attach another attribute to a function
> indicating whether the output of that function depends upon the input
> collations or not. Doing that just for FDW may not be acceptable
> though.
>

Yes, definitely. I thought

Anyway, you're right, after 7 years, this is a really difficult problem to
solve and there's no straightforward solution (to my eyes).
Thanks again for your kind explanations
Regards

-- 
Jean-Christophe Arnu


Re: FDW pushdown of non-collated functions

2023-10-06 Thread Ashutosh Bapat
Hi Jean-Christophe,

On Fri, Sep 8, 2023 at 11:30 PM Jean-Christophe Arnu  wrote:
>
> Maybe we could add another condition to the first if statement in order to 
> allow a “no-collation” function to be pushed down even if they have 
> “collatable” parameters. I’m not sure about the possible regressions of 
> behaviour of this change, but it
seems to work fine with date_trunc() and date_part() (which suffers
the same problem).

That may not work since the output of the function may be dependent
upon the collation on the inputs.

There were similar discussions earlier. E.g.
https://www.postgresql.org/message-id/flat/CACowWR1ARWyRepRxGfijMcsw%2BH84Dj8x2o9N3kvz%3Dz1p%2B6b45Q%40mail.gmail.com.

Reading Tom's first reply there you may work around this by declaring
the collation explicitly.

Briefly reading Tom's reply, the problem seems to be trusting whether
the default collation locally and on the foreign server respectively
is same or not. May be a simple fix is to declare a foreign server
level option declaring that the default collation on the foreign
server is same as the local server may be a way to move forward. But
given that the problem remains unsolved for 7 years at least, may be
such a simple fix is not enough.

Another solution would be to attach another attribute to a function
indicating whether the output of that function depends upon the input
collations or not. Doing that just for FDW may not be acceptable
though.

-- 
Best Wishes,
Ashutosh Bapat




Re: FDW pushdown of non-collated functions

2023-10-05 Thread Jean-Christophe Arnu
Dear Hackers,

I figured out this email was sent at release time. The worst time to ask
for thoughts on a subject IMHO. Anyway, I hope this email will pop the
topic over the stack!
Thank you!

Le ven. 8 sept. 2023 à 16:41, Jean-Christophe Arnu  a
écrit :

> Dear hackers,
>
> I recently found a weird behaviour involving FDW (postgres_fdw) and
> planning.
>
> Here’s a simplified use-case:
>
> Given a remote table (say on server2) with the following definition:
>
> CREATE TABLE t1(
>   ts timestamp without time zone,
>   x  bigint,
>   x2 text
> );
> --Then populate t1 table:INSERT INTO t1
>SELECT
> current_timestamp - 1000*random()*'1 day'::interval
> ,x
> ,''||x
>FROM
> generate_series(1,10) as x;
>
>
> This table is imported in a specific schema on server1 (we do not use
> use_remote_estimate) also with t1 name in a specific schema:
>
> On server1:
>
> CREATE SERVER server2
>FOREIGN DATA WRAPPER  postgres_fdw
>OPTIONS (
>   host '127.0.0.1',
>   port '9002',
>   dbname 'postgres',
>   use_remote_estimate 'false'
>);
> CREATE USER MAPPING FOR jc
>SERVER server2
>OPTIONS (user 'jc');
> CREATE SCHEMA remote;
>
> IMPORT FOREIGN SCHEMA public
>FROM SERVER server2
>INTO remote ;
>
> On a classic PostgreSQL 15 version the following query using date_trunc()
> is executed and results in the following plan:
>
> jc=# explain (verbose,analyze) select date_trunc('day',ts), count(1) from 
> remote.t1 group by date_trunc('day',ts) order by 1;
> QUERY PLAN
>  
> ---
>  Sort  (cost=216.14..216.64 rows=200 width=16) (actual time=116.699..116.727 
> rows=1001 loops=1)
>Output: (date_trunc('day'::text, ts)), (count(1))
>Sort Key: (date_trunc('day'::text, t1.ts))
>Sort Method: quicksort  Memory: 79kB
>->  HashAggregate  (cost=206.00..208.50 rows=200 width=16) (actual 
> time=116.452..116.532 rows=1001 loops=1)
>  Output: (date_trunc('day'::text, ts)), count(1)
>  Group Key: date_trunc('day'::text, t1.ts)
>  Batches: 1  Memory Usage: 209kB
>  ->  Foreign Scan on remote.t1  (cost=100.00..193.20 rows=2560 
> width=8) (actual time=0.384..106.225 rows=10 loops=1)
>Output: date_trunc('day'::text, ts)
>Remote SQL: SELECT ts FROM public.t1
>  Planning Time: 0.077 ms
>  Execution Time: 117.028 ms
>
>
> Whereas the same query with date_bin()
>
> jc=# explain (verbose,analyze) select date_bin('1day',ts,'2023-01-01'), 
> count(1) from remote.t1 group by 1 order by 1;
>   
>   QUERY PLAN  
>   
> 
> --
>  Foreign Scan  (cost=113.44..164.17 rows=200 width=16) (actual 
> time=11.297..16.312 rows=1001 loops=1)
>Output: (date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp 
> without time zone)), (count(1))
>Relations: Aggregate on (remote.t1)
>Remote SQL: SELECT date_bin('1 day'::interval, ts, '2023-01-01 
> 00:00:00'::timestamp without time zone), count(1) FROM public.t1 GROUP BY 1 
> ORDER BY date_bin('1 day'::interval, ts, '2023-01-01 00:00:00'::timestamp 
> without time zone) ASC NULLS LAST
>  Planning Time: 0.114 ms
>  Execution Time: 16.599 ms
>
>
>
> With date_bin() the whole expression is pushed down to the remote server,
> whereas with date_trunc() it’s not.
>
> I dived into the code and live debugged. It turns out that decisions to
> pushdown or not a whole query depends on many factors like volatility and
> collation. In the date_trunc() case, the problem is all about collation (
> date_trunc() on timestamp without time zone). And decision is made in the
> foreign_expr_walker() in deparse.c (
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/postgres_fdw/deparse.c;h=efaf387890e3f85c419748ec3af5d1e9696c9c4c;hb=86648dcdaec67b83cec20a9d25b45ec089a7c624#l468
> )
>
> First the function is tested as shippable (able to be pushed down) and
> date_trunc() and date_bin() both are.
>
> Then parameters sub-expressions are evaluated with collation and
> “shippability”, and they all are with both functions.
>
> Then we arrive at this code portion:
>
> if (fe->inputcollid == InvalidOid)
>   /* OK, inputs are all noncollatable */ ;else if 

FDW pushdown of non-collated functions

2023-09-08 Thread Jean-Christophe Arnu
Dear hackers,

I recently found a weird behaviour involving FDW (postgres_fdw) and
planning.

Here’s a simplified use-case:

Given a remote table (say on server2) with the following definition:

CREATE TABLE t1(
  ts timestamp without time zone,
  x  bigint,
  x2 text
);
--Then populate t1 table:INSERT INTO t1
   SELECT
current_timestamp - 1000*random()*'1 day'::interval
,x
,''||x
   FROM
generate_series(1,10) as x;


This table is imported in a specific schema on server1 (we do not use
use_remote_estimate) also with t1 name in a specific schema:

On server1:

CREATE SERVER server2
   FOREIGN DATA WRAPPER  postgres_fdw
   OPTIONS (
  host '127.0.0.1',
  port '9002',
  dbname 'postgres',
  use_remote_estimate 'false'
   );
CREATE USER MAPPING FOR jc
   SERVER server2
   OPTIONS (user 'jc');
CREATE SCHEMA remote;

IMPORT FOREIGN SCHEMA public
   FROM SERVER server2
   INTO remote ;

On a classic PostgreSQL 15 version the following query using date_trunc()
is executed and results in the following plan:

jc=# explain (verbose,analyze) select date_trunc('day',ts), count(1)
from remote.t1 group by date_trunc('day',ts) order by 1;
QUERY PLAN

---
 Sort  (cost=216.14..216.64 rows=200 width=16) (actual
time=116.699..116.727 rows=1001 loops=1)
   Output: (date_trunc('day'::text, ts)), (count(1))
   Sort Key: (date_trunc('day'::text, t1.ts))
   Sort Method: quicksort  Memory: 79kB
   ->  HashAggregate  (cost=206.00..208.50 rows=200 width=16) (actual
time=116.452..116.532 rows=1001 loops=1)
 Output: (date_trunc('day'::text, ts)), count(1)
 Group Key: date_trunc('day'::text, t1.ts)
 Batches: 1  Memory Usage: 209kB
 ->  Foreign Scan on remote.t1  (cost=100.00..193.20 rows=2560
width=8) (actual time=0.384..106.225 rows=10 loops=1)
   Output: date_trunc('day'::text, ts)
   Remote SQL: SELECT ts FROM public.t1
 Planning Time: 0.077 ms
 Execution Time: 117.028 ms


Whereas the same query with date_bin()

jc=# explain (verbose,analyze) select
date_bin('1day',ts,'2023-01-01'), count(1) from remote.t1 group by 1
order by 1;

 QUERY PLAN


--
 Foreign Scan  (cost=113.44..164.17 rows=200 width=16) (actual
time=11.297..16.312 rows=1001 loops=1)
   Output: (date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone)), (count(1))
   Relations: Aggregate on (remote.t1)
   Remote SQL: SELECT date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone), count(1) FROM public.t1 GROUP
BY 1 ORDER BY date_bin('1 day'::interval, ts, '2023-01-01
00:00:00'::timestamp without time zone) ASC NULLS LAST
 Planning Time: 0.114 ms
 Execution Time: 16.599 ms



With date_bin() the whole expression is pushed down to the remote server,
whereas with date_trunc() it’s not.

I dived into the code and live debugged. It turns out that decisions to
pushdown or not a whole query depends on many factors like volatility and
collation. In the date_trunc() case, the problem is all about collation (
date_trunc() on timestamp without time zone). And decision is made in the
foreign_expr_walker() in deparse.c (
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/postgres_fdw/deparse.c;h=efaf387890e3f85c419748ec3af5d1e9696c9c4c;hb=86648dcdaec67b83cec20a9d25b45ec089a7c624#l468
)

First the function is tested as shippable (able to be pushed down) and
date_trunc() and date_bin() both are.

Then parameters sub-expressions are evaluated with collation and
“shippability”, and they all are with both functions.

Then we arrive at this code portion:

if (fe->inputcollid == InvalidOid)
  /* OK, inputs are all noncollatable */ ;else if (inner_cxt.state !=
FDW_COLLATE_SAFE ||
 fe->inputcollid != inner_cxt.collation)
  return false;

For date_trunc() function :

   -

   fe variable contains the sub-expressions/arguments merged constraints
   such as fe->inputcollid. This field is evaluated to 100 (default
   collation) so codes jumps to else statement and evaluates the if
   predicates. This 100 inputcollationid is due to text predicate 'day'.
   -

   inner_cxt.state contains FDW_COLLATE_STATE but inner_cxt.collation
   contains 0 (InvalidOid) so the control flow returns false thus the
   function cannot be pushed down.

For date_bin() function :

   - fe variable contains the sub-expressions/arguments merged constraints.
   Here, fe->inputcollid is