The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:

    /*
     * If data file was sorted, and we knew it somehow, we could insert
     * appropriate pathkeys into the ForeignPath node to tell the planner
     * that.
     */

To do this, I would like to propose new generic options for a file_fdw foreign
table to specify the sortedness of a data file.  While it is best to allow to
specify the sortedness on multiple columns, the current interface for the
generic options dose not seems to be suitable for doing it.  As a compromise, I
would like to propose single-column sortedness options and insert appropriate
pathkeys into the ForeignPath node based on these information:

 sorted: Boolean option to specify whether data file is sorted by a column.
 key: Specifies the name of a column by which data file is sorted.  Required
when the above option is set to true.
 direction: Specifies the sort order: asc or desc.  The default is asc.
 nulls: Specifies that nulls sort before or after non-nulls: first or last.
first is the default when direction option is set desc.  When direction option
is not set desc, last is the default.

Attached is a WIP patch implementing this feature.  I would like to demonstrate
the usefulness of the patch.  Experimental results are shown below.  Here, data
in /home/pgsql/relation.csv is sorted by aid in ascending order.

postgres=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER fs FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
postgres=# CREATE FOREIGN TABLE ft (aid INTEGER, bid INTEGER) SERVER fs OPTIONS
(filename '/home/pgsql/relation.csv', format 'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# ANALYZE ft;
ANALYZE
postgres=# SELECT count(*) FROM ft;
  count
----------
 50000000
(1 row)

postgres=# SELECT count(DISTINCT aid) FROM ft;
  count
---------
 1000000
(1 row)

postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
                                                             QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------
 Unique  (cost=14187375.19..14437375.19 rows=1092929 width=4) (actual time=48952
.602..62788.934 rows=1000000 loops=1)
   ->  Sort  (cost=14187375.19..14312375.19 rows=50000000 width=4) (actual time=
48952.601..56635.448 rows=50000000 loops=1)
         Sort Key: aid
         Sort Method: external sort  Disk: 684272kB
         ->  Foreign Scan on ft  (cost=0.00..5059137.00 rows=50000000 width=4) (
actual time=0.073..18324.062 rows=50000000 loops=1)
               Foreign File: /home/pgsql/relation.csv
               Foreign File Size: 484444500
 Total runtime: 63019.868 ms
(8 rows)

postgres=# ALTER FOREIGN TABLE ft OPTIONS ( ADD  sorted 'true', key 'aid' );
ALTER FOREIGN TABLE
postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
                                                          QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------
 Unique  (cost=0.00..5184137.00 rows=1092929 width=4) (actual time=0.074..23124.
195 rows=1000000 loops=1)
   ->  Foreign Scan on ft  (cost=0.00..5059137.00 rows=50000000 width=4) (actual
 time=0.070..17633.821 rows=50000000 loops=1)
         Foreign File: /home/pgsql/relation.csv
         Foreign File Size: 484444500
 Total runtime: 23213.909 ms
(5 rows)

Any comments and suggestions are welcomed.

Thanks,

Best regards,
Etsuro Fujita

Attachment: file-fdw-pathkeys-0802.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to