Re: [HACKERS] Question about optimising (Postgres_)FDW

Etsuro Fujita Wed, 16 Apr 2014 04:37:33 -0700

(2014/04/16 6:55), Hannu Krosing wrote:

----------------------------------
CREATE EXTENSION postgres_fdw;


CREATE SERVER loop foreign data wrapper postgres_fdw
   OPTIONS (port '5432', dbname 'testdb');

CREATE USER MAPPING FOR PUBLIC SERVER loop;

create table onemillion (
     id serial primary key,
     inserted timestamp default clock_timestamp(),
     data text
);

insert into onemillion(data) select random() from
generate_series(1,1000000);

CREATE FOREIGN TABLE onemillion_pgfdw (
     id int,
     inserted timestamp,
     data text
) SERVER loop
OPTIONS (table_name 'onemillion',
          use_remote_estimate 'true');

testdb=# explain analyse
select * from onemillion_pgfdw where id in (select id from onemillion
where data > '0.9' limit 100);
                                                            QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=122.49..10871.06 rows=500000 width=44) (actual
time=4.269..93.444 rows=100 loops=1)
    ->  HashAggregate  (cost=22.06..23.06 rows=100 width=4) (actual
time=1.110..1.263 rows=100 loops=1)
          ->  Limit  (cost=0.00..20.81 rows=100 width=4) (actual
time=0.038..1.026 rows=100 loops=1)
                ->  Seq Scan on onemillion  (cost=0.00..20834.00
rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1)
                      Filter: (data > '0.9'::text)
                      Rows Removed by Filter: 805
    ->  Foreign Scan on onemillion_pgfdw  (cost=100.43..108.47 rows=1
width=29) (actual time=0.772..0.773 rows=1 loops=100)
  Total runtime: 93.820 ms
(8 rows)

Time: 97.283 ms
------------------------------

... actually performs 100 distinct "SELECT * FROM onemillion WHERE id =
$1" calls on "remote" side.

Maybe I'm missing something, but I think that you can do what I thinkyou'd like to do by the following procedure:


postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000');
ALTER SERVER

postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in(SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=1023.10..41983.21 rows=100 width=30)

Output: onemillion_pgsql.id, onemillion_pgsql.inserted,onemillion_pgsql.data

   Hash Cond: (onemillion_pgsql.id = onemillion.id)

-> Foreign Scan on public.onemillion_pgsql (cost=1000.00..39334.00rows=1000000 width=29)Output: onemillion_pgsql.id, onemillion_pgsql.inserted,onemillion_pgsql.data

         Remote SQL: SELECT id, inserted, data FROM public.onemillion
   ->  Hash  (cost=21.85..21.85 rows=100 width=4)
         Output: onemillion.id
         ->  Limit  (cost=0.00..20.85 rows=100 width=4)
               Output: onemillion.id

-> Seq Scan on public.onemillion (cost=0.00..20834.00rows=99918 width=4)

                     Output: onemillion.id
                     Filter: (onemillion.data > '0.9'::text)
 Planning time: 0.690 ms
(14 rows)

or, that as Tom mentioned, by disabling the use_remote_estimate function:

postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SETuse_remote_estimate 'false');

ALTER FOREIGN TABLE

postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in(SELECT id FROM onemillion WHERE data > '0.9' LIMIT 100);

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=123.10..41083.21 rows=100 width=30)

Output: onemillion_pgsql.id, onemillion_pgsql.inserted,onemillion_pgsql.data

   Hash Cond: (onemillion_pgsql.id = onemillion.id)

-> Foreign Scan on public.onemillion_pgsql (cost=100.00..38434.00rows=1000000 width=30)Output: onemillion_pgsql.id, onemillion_pgsql.inserted,onemillion_pgsql.data

         Remote SQL: SELECT id, inserted, data FROM public.onemillion
   ->  Hash  (cost=21.85..21.85 rows=100 width=4)
         Output: onemillion.id
         ->  Limit  (cost=0.00..20.85 rows=100 width=4)
               Output: onemillion.id

-> Seq Scan on public.onemillion (cost=0.00..20834.00rows=99918 width=4)

                     Output: onemillion.id
                     Filter: (onemillion.data > '0.9'::text)
 Planning time: 0.215 ms
(14 rows)

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

Reply via email to