> On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
> >> If I would make a proof-of-concept patch with interface itself, it
> >> seems to me file_fdw may be a good candidate for this enhancement.
> >> It is not a field for postgres_fdw.
> >>
> > The attached patch is enhancement of FDW/CSP interface and PoC feature
> > of file_fdw to scan source file partially. It was smaller enhancement
> > than my expectations.
> >
> > It works as follows. This query tried to read 20M rows from a CSV file,
> > using 3 background worker processes.
> >
> > postgres=# set max_parallel_degree = 3;
> > SET
> > postgres=# explain analyze select * from test_csv where id % 20 = 6;
> >                                   QUERY PLAN
> >
> ----------------------------------------------------------------------------
> ----
> >  Gather  (cost=1000.00..194108.60 rows=94056 width=52)
> >          (actual time=0.570..19268.010 rows=2000000 loops=1)
> >    Number of Workers: 3
> >    ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056
> width=52)
> >                                   (actual time=0.180..12744.655 rows=500000
> loops=4)
> >          Filter: ((id % 20) = 6)
> >          Rows Removed by Filter: 9500000
> >          Foreign File: /tmp/testdata.csv
> >          Foreign File Size: 1504892535
> >  Planning time: 0.147 ms
> >  Execution time: 19330.201 ms
> > (9 rows)
> 
> Could you try it not in parallel and then with 1, 2, 3, and 4 workers
> and post the times for all?
>
The above query has 5% selectivity on the entire CSV file.
Its execution time (total, only ForeignScan) are below

             total         ForeignScan        diff
0 workers: 17584.319 ms   17555.904 ms      28.415 ms
1 workers: 18464.476 ms   18110.968 ms     353.508 ms
2 workers: 19042.755 ms   14580.335 ms    4462.420 ms
3 workers: 19318.254 ms   12668.912 ms    6649.342 ms
4 workers: 21732.910 ms   13596.788 ms    8136.122 ms
5 workers: 23486.846 ms   14533.409 ms    8953.437 ms

This workstation has 4 CPU cores, so it is natural nworkers=3 records the
peak performance on ForeignScan portion. On the other hands, nworkers>1 also
recorded unignorable time consumption (probably, by Gather node?)

An interesting observation was, less selectivity (1% and 0%) didn't change the
result so much. Something consumes CPU time other than file_fdw.

* selectivity 1%
               total       ForeignScan       diff
0 workers: 17573.572 ms   17566.875 ms      6.697 ms
1 workers: 18098.070 ms   18020.790 ms     77.280 ms
2 workers: 18676.078 ms   14600.749 ms   4075.329 ms
3 workers: 18830.597 ms   12731.459 ms   6099.138 ms
4 workers: 21015.842 ms   13590.657 ms   7425.185 ms
5 workers: 22865.496 ms   14634.342 ms   8231.154 ms

* selectivity 0% (...so Gather didn't work hard actually)
              total        ForeignScan       diff
0 workers: 17551.011 ms   17550.811 ms      0.200 ms
1 workers: 18055.185 ms   18048.975 ms      6.210 ms
2 workers: 18567.660 ms   14593.974 ms   3973.686 ms
3 workers: 18649.819 ms   12671.429 ms   5978.390 ms
4 workers: 20619.184 ms   13606.715 ms   7012.469 ms
5 workers: 22557.575 ms   14594.420 ms   7963.155 ms

Further investigation will need....

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>

postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on test_csv  (cost=0.00..2158874.49 rows=94056 width=52) (actual 
time=17550.811..17550.811 rows=0 loops=1)
   Filter: ((id % 100) = 100)
   Rows Removed by Filter: 20000000
   Foreign File: /tmp/testdata.csv
   Foreign File Size: 1504892535
 Planning time: 1.175 ms
 Execution time: 17551.011 ms
(7 rows)

postgres=# SET max_parallel_degree = 1;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=18054.651..18054.651 rows=0 loops=1)
   Number of Workers: 1
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=18048.975..18048.975 rows=0 loops=2)
         Filter: ((id % 100) = 100)
         Rows Removed by Filter: 20000000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.461 ms
 Execution time: 18055.185 ms
(9 rows)

postgres=# SET max_parallel_degree = 2;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=18567.518..18567.518 rows=0 loops=1)
   Number of Workers: 2
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=14593.974..14593.974 rows=0 loops=3)
         Filter: ((id % 100) = 100)
         Rows Removed by Filter: 13333333
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.133 ms
 Execution time: 18567.660 ms
(9 rows)

postgres=# SET max_parallel_degree = 3;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=18649.676..18649.676 rows=0 loops=1)
   Number of Workers: 3
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=12671.429..12671.429 rows=0 loops=4)
         Filter: ((id % 100) = 100)
         Rows Removed by Filter: 10000000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.132 ms
 Execution time: 18649.819 ms
(9 rows)

postgres=# SET max_parallel_degree = 4;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=20619.039..20619.039 rows=0 loops=1)
   Number of Workers: 4
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=13606.715..13606.715 rows=0 loops=5)
         Filter: ((id % 100) = 100)
         Rows Removed by Filter: 8000000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.133 ms
 Execution time: 20619.184 ms
(9 rows)

postgres=# SET max_parallel_degree = 5;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 100;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=22557.451..22557.451 rows=0 loops=1)
   Number of Workers: 5
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=14594.420..14594.420 rows=0 loops=6)
         Filter: ((id % 100) = 100)
         Rows Removed by Filter: 6666667
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.125 ms
 Execution time: 22557.575 ms
(9 rows)
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on test_csv  (cost=0.00..2158874.49 rows=94056 width=52) (actual 
time=0.353..17566.875 rows=200000 loops=1)
   Filter: ((id % 100) = 56)
   Rows Removed by Filter: 19800000
   Foreign File: /tmp/testdata.csv
   Foreign File Size: 1504892535
 Planning time: 1.202 ms
 Execution time: 17573.572 ms
(7 rows)

postgres=# SET max_parallel_degree = 1;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=1.756..18085.642 rows=400000 loops=1)
   Number of Workers: 1
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.300..18020.790 rows=200000 loops=2)
         Filter: ((id % 100) = 56)
         Rows Removed by Filter: 19800000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.143 ms
 Execution time: 18098.070 ms
(9 rows)

postgres=# SET max_parallel_degree = 2;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.835..18663.106 rows=400000 loops=1)
   Number of Workers: 2
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.513..14600.749 rows=133333 loops=3)
         Filter: ((id % 100) = 56)
         Rows Removed by Filter: 13200000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.134 ms
 Execution time: 18676.078 ms
(9 rows)

postgres=# SET max_parallel_degree = 3;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.824..18817.112 rows=400000 loops=1)
   Number of Workers: 3
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.511..12731.459 rows=100000 loops=4)
         Filter: ((id % 100) = 56)
         Rows Removed by Filter: 9900000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.120 ms
 Execution time: 18830.597 ms
(9 rows)

postgres=# SET max_parallel_degree = 4;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.851..21002.776 rows=400000 loops=1)
   Number of Workers: 4
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=4.608..13590.657 rows=80000 loops=5)
         Filter: ((id % 100) = 56)
         Rows Removed by Filter: 7920000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.133 ms
 Execution time: 21015.842 ms
(9 rows)

postgres=# SET max_parallel_degree = 5;
SET
postgres=# explain analyze select * from test_csv where id % 100 = 56;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.878..22852.304 rows=400000 loops=1)
   Number of Workers: 5
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=11.043..14634.342 rows=66667 loops=6)
         Filter: ((id % 100) = 56)
         Rows Removed by Filter: 6600000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.122 ms
 Execution time: 22865.496 ms
(9 rows)
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on test_csv  (cost=0.00..2158874.49 rows=94056 width=52) (actual 
time=0.084..17555.904 rows=1000000 loops=1)
   Filter: ((id % 20) = 6)
   Rows Removed by Filter: 19000000
   Foreign File: /tmp/testdata.csv
   Foreign File Size: 1504892535
 Planning time: 0.122 ms
 Execution time: 17584.319 ms
(7 rows)

postgres=# SET max_parallel_degree = 1;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=1.435..18406.866 rows=2000000 loops=1)
   Number of Workers: 1
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.117..18110.968 rows=1000000 loops=2)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 19000000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.121 ms
 Execution time: 18464.476 ms
(9 rows)

postgres=# SET max_parallel_degree = 2;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.352..18983.054 rows=2000000 loops=1)
   Number of Workers: 2
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.114..14580.335 rows=666667 loops=3)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 12666667
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.091 ms
 Execution time: 19042.755 ms
(9 rows)

postgres=# SET max_parallel_degree = 3;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.502..19257.240 rows=2000000 loops=1)
   Number of Workers: 3
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=0.225..12668.912 rows=500000 loops=4)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 9500000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.132 ms
 Execution time: 19318.254 ms
(9 rows)

postgres=# SET max_parallel_degree = 4;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.624..21662.065 rows=2000000 loops=1)
   Number of Workers: 4
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=5.084..13596.788 rows=400000 loops=5)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 7600000
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.130 ms
 Execution time: 21732.910 ms
(9 rows)

postgres=# SET max_parallel_degree = 5;
SET
postgres=# explain analyze select * from test_csv where id % 20 = 6;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..194108.60 rows=94056 width=52) (actual 
time=0.561..23396.344 rows=2000000 loops=1)
   Number of Workers: 5
   ->  Parallel Foreign Scan on test_csv  (cost=0.00..183703.00 rows=94056 
width=52) (actual time=1.726..14533.409 rows=333333 loops=6)
         Filter: ((id % 20) = 6)
         Rows Removed by Filter: 6333333
         Foreign File: /tmp/testdata.csv
         Foreign File Size: 1504892535
 Planning time: 0.118 ms
 Execution time: 23486.846 ms
(9 rows)
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to