I need to do a join between two foreign tables using columns of different
types.

select data from remote2 join remote1 on ((remote2.id)::bigint=remote1.id)
where cutoff > 0.9999;

For demonstration purposes, I use a loop-back foreign server, set up in the
attached sql file.

If I do the join directly on the "foreign" server by specifying the
schemaname where the physical tables live, it uses a sensible join plan,
using an index on cutoff column to get a handful of rows, then casting the
id column and using in index on remote1.id to get each row there.

explain analyze select data from remote.remote2 join remote.remote1 on ((
remote2.id)::bigint=remote1.id) where cutoff > 0.9999;

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=5.56..1100.48 rows=100 width=8) (actual
time=0.303..5.598 rows=119 loops=1)
   ->  Bitmap Heap Scan on remote2  (cost=5.13..334.85 rows=91 width=7)
(actual time=0.112..0.899 rows=105 loops=1)
         Recheck Cond: (cutoff > '0.9999'::double precision)
         Heap Blocks: exact=105
         ->  Bitmap Index Scan on remote2_cutoff_idx  (cost=0.00..5.11
rows=91 width=0) (actual time=0.062..0.062 rows=105 loops=1)
               Index Cond: (cutoff > '0.9999'::double precision)
   ->  Index Scan using remote1_id_idx on remote1  (cost=0.43..8.40 rows=1
width=16) (actual time=0.038..0.041 rows=1 loops=105)
         Index Cond: (id = (remote2.id)::bigint)


But if I go through the foreign machinery, it doesn't use a good plan:

explain analyze select data from remote2 join remote1 on ((remote2.id
)::bigint=remote1.id) where cutoff > 0.9999;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=537.81..76743.81 rows=455000 width=4) (actual
time=75.019..4659.802 rows=119 loops=1)
   Hash Cond: (remote1.id = (remote2.id)::bigint)
   ->  Foreign Scan on remote1  (cost=100.00..35506.00 rows=1000000
width=16) (actual time=1.110..4143.655 rows=1000000 loops=1)
   ->  Hash  (cost=436.67..436.67 rows=91 width=7) (actual
time=2.754..2.754 rows=105 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 13kB
         ->  Foreign Scan on remote2  (cost=105.13..436.67 rows=91 width=7)
(actual time=1.567..2.646 rows=105 loops=1)
 Planning time: 29.629 ms
 Execution time: 4660.433 ms

I thought it would either push the entire join to the foreign side, or at
least do a foreign index scan on remote2_cutoff_idx, then loop over each
row and do a foreign index scans against remote1_id_idx.

I've tried versions 9.6.3 and 10dev, and neither do what I expected.  It
doesn't seem to be a planning problem where it thinks the fast plan is
slower, it just doesn't seem to consider the faster plans as being options
at all.  Is there some setting to make it realize the cast is shippable?
Is any of the work being done on postgres_fdw for V11 working towards
fixing this?

Cheers,

Jeff
drop database foobar;
create database foobar;
\c foobar

CREATE EXTENSION IF NOT EXISTS postgres_fdw WITH SCHEMA public;

CREATE SERVER remote_server FOREIGN DATA WRAPPER postgres_fdw options (dbname 
'foobar') ;
CREATE USER MAPPING FOR postgres SERVER remote_server OPTIONS (
    "user" 'postgres'
);

CREATE SCHEMA remote;
SET search_path = remote, pg_catalog;

CREATE TABLE remote1 as select floor(random()*10000000)::bigint as id, 
floor(random()*10000000)::bigint as data from generate_series(1,1000000);
create index on remote1 (id);
CREATE TABLE remote2 as select id::text as id, random() as cutoff from remote1;
create index on remote2 (cutoff);
vacuum analyze;

reset search_path;

CREATE FOREIGN TABLE remote1 (
   id integer,
   data integer
)
SERVER remote_server
OPTIONS (
    schema_name 'remote',
    table_name 'remote1',
    use_remote_estimate 'true'
);

CREATE FOREIGN TABLE remote2 (
   id text,
   cutoff double precision 
)
SERVER remote_server
OPTIONS (
    schema_name 'remote',
    table_name 'remote2',
    use_remote_estimate 'true'
);

analyze;

explain analyze select data from remote2 join remote1 on 
((remote2.id)::bigint=remote1.id) where cutoff > 0.9999;
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to