Re: [HACKERS] CustomScan under the Gather node?
> -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Thursday, February 04, 2016 2:54 AM > To: Kaigai Kouhei(海外 浩平) > Cc: pgsql-hackers@postgresql.org > Subject: ##freemail## Re: [HACKERS] CustomScan under the Gather node? > > On Thu, Jan 28, 2016 at 8:14 PM, Kouhei Kaigai wrote: > >> total ForeignScandiff > >> 0 workers: 17584.319 ms 17555.904 ms 28.415 ms > >> 1 workers: 18464.476 ms 18110.968 ms 353.508 ms > >> 2 workers: 19042.755 ms 14580.335 ms4462.420 ms > >> 3 workers: 19318.254 ms 12668.912 ms6649.342 ms > >> 4 workers: 21732.910 ms 13596.788 ms8136.122 ms > >> 5 workers: 23486.846 ms 14533.409 ms8953.437 ms > >> > >> This workstation has 4 CPU cores, so it is natural nworkers=3 records the > >> peak performance on ForeignScan portion. On the other hands, nworkers>1 > >> also > >> recorded unignorable time consumption (probably, by Gather node?) > > : > >> Further investigation will need > >> > > It was a bug of my file_fdw patch. ForeignScan node in the master process > > was > > also kicked by the Gather node, however, it didn't have coordinate > > information > > due to oversight of the initialization at InitializeDSMForeignScan callback. > > In the result, local ForeignScan node is still executed after the completion > > of coordinated background worker processes, and returned twice amount of > > rows. > > > > In the revised patch, results seems to me reasonable. > > total ForeignScan diff > > 0 workers: 17592.498 ms 17564.457 ms 28.041ms > > 1 workers: 12152.998 ms 11983.485 ms169.513 ms > > 2 workers: 10647.858 ms 10502.100 ms145.758 ms > > 3 workers: 9635.445 ms9509.899 ms125.546 ms > > 4 workers: 11175.456 ms 10863.293 ms312.163 ms > > 5 workers: 12586.457 ms 12279.323 ms307.134 ms > > Hmm. Is the file_fdw part of this just a demo, or do you want to try > to get that committed? If so, maybe start a new thread with a more > appropriate subject line to just talk about that. I haven't > scrutinized that part of the patch in any detail, but the general > infrastructure for FDWs and custom scans to use parallelism seems to > be in good shape, so I rewrote the documentation and committed that > part. > Thanks, I expect file_fdw part is just for demonstration. It does not require any special hardware to reproduce this parallel execution, rather than GpuScan of PG-Strom. > Do you have any idea why this isn't scaling beyond, uh, 1 worker? > That seems like a good thing to try to figure out. > The hardware I run the above query has 4 CPU cores, so it is not surprising that 3 workers (+ 1 master) recorded the peak performance. In addition, enhancement of file_fdw part is a corner-cutting work. It picks up the next line number to be fetched from the shared memory segment using pg_atomic_add_fetch_u32(), then it reads the input file until worker meets the target line. Unrelated line shall be ignored. Individual worker parses its responsible line only, thus, parallel execution makes sense in this part. On the other hands, total amount of CPU cycles for file scan will increase because all the workers at least have to parse all the lines. If we would simply split time consumption factor in 0 worker case as follows: (time to scan file; TSF) + (time to parse lines; TPL) Total amount of workloads when we distribute file_fdw into N workers is: N * (TSF) + (TPL) Thus, individual worker has to process the following amount of works: (TSF) + (TPL)/N It is a typical formula of Amdahl's law when sequencial part is not small. The above result says, TSF part is about 7.4s, TPL part is about 10.1s. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
On Thu, Jan 28, 2016 at 8:14 PM, Kouhei Kaigai wrote: >> total ForeignScandiff >> 0 workers: 17584.319 ms 17555.904 ms 28.415 ms >> 1 workers: 18464.476 ms 18110.968 ms 353.508 ms >> 2 workers: 19042.755 ms 14580.335 ms4462.420 ms >> 3 workers: 19318.254 ms 12668.912 ms6649.342 ms >> 4 workers: 21732.910 ms 13596.788 ms8136.122 ms >> 5 workers: 23486.846 ms 14533.409 ms8953.437 ms >> >> This workstation has 4 CPU cores, so it is natural nworkers=3 records the >> peak performance on ForeignScan portion. On the other hands, nworkers>1 also >> recorded unignorable time consumption (probably, by Gather node?) > : >> Further investigation will need >> > It was a bug of my file_fdw patch. ForeignScan node in the master process was > also kicked by the Gather node, however, it didn't have coordinate information > due to oversight of the initialization at InitializeDSMForeignScan callback. > In the result, local ForeignScan node is still executed after the completion > of coordinated background worker processes, and returned twice amount of rows. > > In the revised patch, results seems to me reasonable. > total ForeignScan diff > 0 workers: 17592.498 ms 17564.457 ms 28.041ms > 1 workers: 12152.998 ms 11983.485 ms169.513 ms > 2 workers: 10647.858 ms 10502.100 ms145.758 ms > 3 workers: 9635.445 ms9509.899 ms125.546 ms > 4 workers: 11175.456 ms 10863.293 ms312.163 ms > 5 workers: 12586.457 ms 12279.323 ms307.134 ms Hmm. Is the file_fdw part of this just a demo, or do you want to try to get that committed? If so, maybe start a new thread with a more appropriate subject line to just talk about that. I haven't scrutinized that part of the patch in any detail, but the general infrastructure for FDWs and custom scans to use parallelism seems to be in good shape, so I rewrote the documentation and committed that part. Do you have any idea why this isn't scaling beyond, uh, 1 worker? That seems like a good thing to try to figure out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
> total ForeignScandiff > 0 workers: 17584.319 ms 17555.904 ms 28.415 ms > 1 workers: 18464.476 ms 18110.968 ms 353.508 ms > 2 workers: 19042.755 ms 14580.335 ms4462.420 ms > 3 workers: 19318.254 ms 12668.912 ms6649.342 ms > 4 workers: 21732.910 ms 13596.788 ms8136.122 ms > 5 workers: 23486.846 ms 14533.409 ms8953.437 ms > > This workstation has 4 CPU cores, so it is natural nworkers=3 records the > peak performance on ForeignScan portion. On the other hands, nworkers>1 also > recorded unignorable time consumption (probably, by Gather node?) : > Further investigation will need > It was a bug of my file_fdw patch. ForeignScan node in the master process was also kicked by the Gather node, however, it didn't have coordinate information due to oversight of the initialization at InitializeDSMForeignScan callback. In the result, local ForeignScan node is still executed after the completion of coordinated background worker processes, and returned twice amount of rows. In the revised patch, results seems to me reasonable. total ForeignScan diff 0 workers: 17592.498 ms 17564.457 ms 28.041ms 1 workers: 12152.998 ms 11983.485 ms169.513 ms 2 workers: 10647.858 ms 10502.100 ms145.758 ms 3 workers: 9635.445 ms9509.899 ms125.546 ms 4 workers: 11175.456 ms 10863.293 ms312.163 ms 5 workers: 12586.457 ms 12279.323 ms307.134 ms Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Friday, January 29, 2016 8:51 AM > To: Robert Haas > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CustomScan under the Gather node? > > > On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai > > wrote: > > >> If I would make a proof-of-concept patch with interface itself, it > > >> seems to me file_fdw may be a good candidate for this enhancement. > > >> It is not a field for postgres_fdw. > > >> > > > The attached patch is enhancement of FDW/CSP interface and PoC feature > > > of file_fdw to scan source file partially. It was smaller enhancement > > > than my expectations. > > > > > > It works as follows. This query tried to read 20M rows from a CSV file, > > > using 3 background worker processes. > > > > > > postgres=# set max_parallel_degree = 3; > > > SET > > > postgres=# explain analyze select * from test_csv where id % 20 = 6; > > > QUERY PLAN > > > > > > > > > > > Gather (cost=1000.00..194108.60 rows=94056 width=52) > > > (actual time=0.570..19268.010 rows=200 loops=1) > > >Number of Workers: 3 > > >-> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 > > width=52) > > > (actual time=0.180..12744.655 > rows=50 > > loops=4) > > > Filter: ((id % 20) = 6) > > > Rows Removed by Filter: 950 > > > Foreign File: /tmp/testdata.csv > > > Foreign File Size: 1504892535 > > > Planning time: 0.147 ms > > > Execution time: 19330.201 ms > > > (9 rows) > > > > Could you try it not in parallel and then with 1, 2, 3, and 4 workers > > and post the times for all? > > > The above query has 5% selectivity on the entire CSV file. > Its execution time (total, only ForeignScan) are below > > total ForeignScandiff > 0 workers: 17584.319 ms 17555.904 ms 28.415 ms > 1 workers: 18464.476 ms 18110.968 ms 353.508 ms > 2 workers: 19042.755 ms 14580.335 ms4462.420 ms > 3 workers: 19318.254 ms 12668.912 ms6649.342 ms > 4 workers: 21732.910 ms 13596.788 ms8136.122 ms > 5 workers: 23486.846 ms 14533.409 ms8953.437 ms > > This workstation has 4 CPU cores, so it is natural nworkers=3 records the > peak performance on ForeignScan portion. On the other hands, nworkers>1 also > recorded unignorable time consumption (probably, by Gather node?) > > An interesting observation was, less selectivity (1% and 0%) didn't change the > result so much. Something consumes CPU time other than file_fdw. > > * selectivity 1% >total ForeignScan diff > 0 workers: 17573.572 ms 17566.875 ms 6.697 ms > 1 workers: 18098.070 ms 18020.790 ms 77.280 ms > 2 workers: 18
Re: [HACKERS] CustomScan under the Gather node?
> On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai wrote: > >> If I would make a proof-of-concept patch with interface itself, it > >> seems to me file_fdw may be a good candidate for this enhancement. > >> It is not a field for postgres_fdw. > >> > > The attached patch is enhancement of FDW/CSP interface and PoC feature > > of file_fdw to scan source file partially. It was smaller enhancement > > than my expectations. > > > > It works as follows. This query tried to read 20M rows from a CSV file, > > using 3 background worker processes. > > > > postgres=# set max_parallel_degree = 3; > > SET > > postgres=# explain analyze select * from test_csv where id % 20 = 6; > > QUERY PLAN > > > > > > Gather (cost=1000.00..194108.60 rows=94056 width=52) > > (actual time=0.570..19268.010 rows=200 loops=1) > >Number of Workers: 3 > >-> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 > width=52) > > (actual time=0.180..12744.655 rows=50 > loops=4) > > Filter: ((id % 20) = 6) > > Rows Removed by Filter: 950 > > Foreign File: /tmp/testdata.csv > > Foreign File Size: 1504892535 > > Planning time: 0.147 ms > > Execution time: 19330.201 ms > > (9 rows) > > Could you try it not in parallel and then with 1, 2, 3, and 4 workers > and post the times for all? > The above query has 5% selectivity on the entire CSV file. Its execution time (total, only ForeignScan) are below total ForeignScandiff 0 workers: 17584.319 ms 17555.904 ms 28.415 ms 1 workers: 18464.476 ms 18110.968 ms 353.508 ms 2 workers: 19042.755 ms 14580.335 ms4462.420 ms 3 workers: 19318.254 ms 12668.912 ms6649.342 ms 4 workers: 21732.910 ms 13596.788 ms8136.122 ms 5 workers: 23486.846 ms 14533.409 ms8953.437 ms This workstation has 4 CPU cores, so it is natural nworkers=3 records the peak performance on ForeignScan portion. On the other hands, nworkers>1 also recorded unignorable time consumption (probably, by Gather node?) An interesting observation was, less selectivity (1% and 0%) didn't change the result so much. Something consumes CPU time other than file_fdw. * selectivity 1% total ForeignScan diff 0 workers: 17573.572 ms 17566.875 ms 6.697 ms 1 workers: 18098.070 ms 18020.790 ms 77.280 ms 2 workers: 18676.078 ms 14600.749 ms 4075.329 ms 3 workers: 18830.597 ms 12731.459 ms 6099.138 ms 4 workers: 21015.842 ms 13590.657 ms 7425.185 ms 5 workers: 22865.496 ms 14634.342 ms 8231.154 ms * selectivity 0% (...so Gather didn't work hard actually) totalForeignScan diff 0 workers: 17551.011 ms 17550.811 ms 0.200 ms 1 workers: 18055.185 ms 18048.975 ms 6.210 ms 2 workers: 18567.660 ms 14593.974 ms 3973.686 ms 3 workers: 18649.819 ms 12671.429 ms 5978.390 ms 4 workers: 20619.184 ms 13606.715 ms 7012.469 ms 5 workers: 22557.575 ms 14594.420 ms 7963.155 ms Further investigation will need Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei postgres=# explain analyze select * from test_csv where id % 100 = 100; QUERY PLAN - Foreign Scan on test_csv (cost=0.00..2158874.49 rows=94056 width=52) (actual time=17550.811..17550.811 rows=0 loops=1) Filter: ((id % 100) = 100) Rows Removed by Filter: 2000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 1.175 ms Execution time: 17551.011 ms (7 rows) postgres=# SET max_parallel_degree = 1; SET postgres=# explain analyze select * from test_csv where id % 100 = 100; QUERY PLAN --- Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=18054.651..18054.651 rows=0 loops=1) Number of Workers: 1 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=18048.975..18048.975 rows=0 loops=2) Filter: ((id % 100) = 100) Rows Removed by Filter: 2000 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.461 ms Execution time: 18055.185 ms (9 rows) postgres=# SET max_parallel_degree = 2; SET postgres=# explain analyze select * from test_csv where id % 100 = 100; QUERY PLAN ---
Re: [HACKERS] CustomScan under the Gather node?
On Thu, Jan 28, 2016 at 10:50 AM, Kouhei Kaigai wrote: >> If I would make a proof-of-concept patch with interface itself, it >> seems to me file_fdw may be a good candidate for this enhancement. >> It is not a field for postgres_fdw. >> > The attached patch is enhancement of FDW/CSP interface and PoC feature > of file_fdw to scan source file partially. It was smaller enhancement > than my expectations. > > It works as follows. This query tried to read 20M rows from a CSV file, > using 3 background worker processes. > > postgres=# set max_parallel_degree = 3; > SET > postgres=# explain analyze select * from test_csv where id % 20 = 6; > QUERY PLAN > > Gather (cost=1000.00..194108.60 rows=94056 width=52) > (actual time=0.570..19268.010 rows=200 loops=1) >Number of Workers: 3 >-> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 > width=52) > (actual time=0.180..12744.655 rows=50 > loops=4) > Filter: ((id % 20) = 6) > Rows Removed by Filter: 950 > Foreign File: /tmp/testdata.csv > Foreign File Size: 1504892535 > Planning time: 0.147 ms > Execution time: 19330.201 ms > (9 rows) Could you try it not in parallel and then with 1, 2, 3, and 4 workers and post the times for all? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
> If I would make a proof-of-concept patch with interface itself, it > seems to me file_fdw may be a good candidate for this enhancement. > It is not a field for postgres_fdw. > The attached patch is enhancement of FDW/CSP interface and PoC feature of file_fdw to scan source file partially. It was smaller enhancement than my expectations. It works as follows. This query tried to read 20M rows from a CSV file, using 3 background worker processes. postgres=# set max_parallel_degree = 3; SET postgres=# explain analyze select * from test_csv where id % 20 = 6; QUERY PLAN Gather (cost=1000.00..194108.60 rows=94056 width=52) (actual time=0.570..19268.010 rows=200 loops=1) Number of Workers: 3 -> Parallel Foreign Scan on test_csv (cost=0.00..183703.00 rows=94056 width=52) (actual time=0.180..12744.655 rows=50 loops=4) Filter: ((id % 20) = 6) Rows Removed by Filter: 950 Foreign File: /tmp/testdata.csv Foreign File Size: 1504892535 Planning time: 0.147 ms Execution time: 19330.201 ms (9 rows) I'm not 100% certain whether this implementation of file_fdw is reasonable for partial read, however, the callbacks located on the following functions enabled to implement a parallel-aware custom logic based on the coordination information. > * ExecParallelEstimate > * ExecParallelInitializeDSM > * ExecParallelInitializeWorker Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei > -Original Message- > From: Kaigai Kouhei(海外 浩平) > Sent: Thursday, January 28, 2016 9:33 AM > To: 'Robert Haas' > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] CustomScan under the Gather node? > > > On Tue, Jan 26, 2016 at 1:30 AM, Kouhei Kaigai wrote: > > > What enhancement will be necessary to implement similar feature of > > > partial seq-scan using custom-scan interface? > > > > > > It seems to me callbacks on the three points below are needed. > > > * ExecParallelEstimate > > > * ExecParallelInitializeDSM > > > * ExecParallelInitializeWorker > > > > > > Anything else? > > > Does ForeignScan also need equivalent enhancement? > > > > For postgres_fdw, running the query from a parallel worker would > > change the transaction semantics. Suppose you begin a transaction, > > UPDATE data on the foreign server, and then run a parallel query. If > > the leader performs the ForeignScan it will see the uncommitted > > UPDATE, but a worker would have to make its own connection which not > > be part of the same transaction and which would therefore not see the > > update. That's a problem. > > > Ah, yes, as long as FDW driver ensure the remote session has no > uncommitted data, pg_export_snapshot() might provide us an opportunity, > however, once a session writes something, FDW driver has to prohibit it. > > > Also, for postgres_fdw, and many other FDWs I suspect, the assumption > > is that most of the work is being done on the remote side, so doing > > the work in a parallel worker doesn't seem super interesting. Instead > > of incurring transfer costs to move the data from remote to local, we > > incur two sets of transfer costs: first remote to local, then worker > > to leader. Ouch. I think a more promising line of inquiry is to try > > to provide asynchronous execution when we have something like: > > > > Append > > -> Foreign Scan > > -> Foreign Scan > > > > ...so that we can return a row from whichever Foreign Scan receives > > data back from the remote server first. > > > > So it's not impossible that an FDW author could want this, but mostly > > probably not. I think. > > > Yes, I also have same opinion. Likely, local parallelism is not > valuable for the class of FDWs that obtains data from the remote > server (e.g, postgres_fdw, ...), expect for the case when packing > and unpacking cost over the network is major bottleneck. > > On the other hands, it will be valuable for the class of FDW that > performs as a wrapper to local data structure, as like current > partial seq-scan doing. (e.g, file_fdw, ...) > Its data source is not under the transaction control, and 'remote > execution' of these FDWs are eventually executed on the local > computing resources. > > If I would make a proof-of-concept patch with interface itself, it > seems to me file_fdw may be a good candidate for this enhancement. > It is not a field for postgres_fdw. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei pgsql-v9.6-parallel-cspfdw.v1.patch Description: pgsql-v9.6-parallel-cspfdw.v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
> On Tue, Jan 26, 2016 at 1:30 AM, Kouhei Kaigai wrote: > > What enhancement will be necessary to implement similar feature of > > partial seq-scan using custom-scan interface? > > > > It seems to me callbacks on the three points below are needed. > > * ExecParallelEstimate > > * ExecParallelInitializeDSM > > * ExecParallelInitializeWorker > > > > Anything else? > > Does ForeignScan also need equivalent enhancement? > > For postgres_fdw, running the query from a parallel worker would > change the transaction semantics. Suppose you begin a transaction, > UPDATE data on the foreign server, and then run a parallel query. If > the leader performs the ForeignScan it will see the uncommitted > UPDATE, but a worker would have to make its own connection which not > be part of the same transaction and which would therefore not see the > update. That's a problem. > Ah, yes, as long as FDW driver ensure the remote session has no uncommitted data, pg_export_snapshot() might provide us an opportunity, however, once a session writes something, FDW driver has to prohibit it. > Also, for postgres_fdw, and many other FDWs I suspect, the assumption > is that most of the work is being done on the remote side, so doing > the work in a parallel worker doesn't seem super interesting. Instead > of incurring transfer costs to move the data from remote to local, we > incur two sets of transfer costs: first remote to local, then worker > to leader. Ouch. I think a more promising line of inquiry is to try > to provide asynchronous execution when we have something like: > > Append > -> Foreign Scan > -> Foreign Scan > > ...so that we can return a row from whichever Foreign Scan receives > data back from the remote server first. > > So it's not impossible that an FDW author could want this, but mostly > probably not. I think. > Yes, I also have same opinion. Likely, local parallelism is not valuable for the class of FDWs that obtains data from the remote server (e.g, postgres_fdw, ...), expect for the case when packing and unpacking cost over the network is major bottleneck. On the other hands, it will be valuable for the class of FDW that performs as a wrapper to local data structure, as like current partial seq-scan doing. (e.g, file_fdw, ...) Its data source is not under the transaction control, and 'remote execution' of these FDWs are eventually executed on the local computing resources. If I would make a proof-of-concept patch with interface itself, it seems to me file_fdw may be a good candidate for this enhancement. It is not a field for postgres_fdw. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
On Tue, Jan 26, 2016 at 1:30 AM, Kouhei Kaigai wrote: > What enhancement will be necessary to implement similar feature of > partial seq-scan using custom-scan interface? > > It seems to me callbacks on the three points below are needed. > * ExecParallelEstimate > * ExecParallelInitializeDSM > * ExecParallelInitializeWorker > > Anything else? > Does ForeignScan also need equivalent enhancement? For postgres_fdw, running the query from a parallel worker would change the transaction semantics. Suppose you begin a transaction, UPDATE data on the foreign server, and then run a parallel query. If the leader performs the ForeignScan it will see the uncommitted UPDATE, but a worker would have to make its own connection which not be part of the same transaction and which would therefore not see the update. That's a problem. Also, for postgres_fdw, and many other FDWs I suspect, the assumption is that most of the work is being done on the remote side, so doing the work in a parallel worker doesn't seem super interesting. Instead of incurring transfer costs to move the data from remote to local, we incur two sets of transfer costs: first remote to local, then worker to leader. Ouch. I think a more promising line of inquiry is to try to provide asynchronous execution when we have something like: Append -> Foreign Scan -> Foreign Scan ...so that we can return a row from whichever Foreign Scan receives data back from the remote server first. So it's not impossible that an FDW author could want this, but mostly probably not. I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
> -Original Message- > From: Amit Kapila [mailto:amit.kapil...@gmail.com] > Sent: Wednesday, January 27, 2016 2:30 PM > To: Kaigai Kouhei(海外 浩平) > Cc: pgsql-hackers@postgresql.org > Subject: ##freemail## Re: [HACKERS] CustomScan under the Gather node? > > On Tue, Jan 26, 2016 at 12:00 PM, Kouhei Kaigai wrote: > > > > Hello, > > > > What enhancement will be necessary to implement similar feature of > > partial seq-scan using custom-scan interface? > > > > It seems to me callbacks on the three points below are needed. > > * ExecParallelEstimate > > * ExecParallelInitializeDSM > > * ExecParallelInitializeWorker > > > > Anything else? > > I don't think so. > > > Does ForeignScan also need equivalent enhancement? > > I think this depends on the way ForeignScan is supposed to be > parallelized, basically if it needs to coordinate any information > with other set of workers, then it will require such an enhancement. > After the post yesterday, I reminded an possible scenario around FDW if it manages own private storage, like cstore_fdw. Probably, ForeignScan node performing on columnar store (for example) will need a coordination information like as partial seq-scan doing. It is a case very similar to the implementation on local storage. On the other hands, if we try postgres_fdw (or others) to get parallelized with background worker, I doubt whether we need this coordination information on local side. Remote query will have an additional qualifier to skip blocks already fetched for this purpose. At least, it does not needs something special enhancement. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan under the Gather node?
On Tue, Jan 26, 2016 at 12:00 PM, Kouhei Kaigai wrote: > > Hello, > > What enhancement will be necessary to implement similar feature of > partial seq-scan using custom-scan interface? > > It seems to me callbacks on the three points below are needed. > * ExecParallelEstimate > * ExecParallelInitializeDSM > * ExecParallelInitializeWorker > > Anything else? I don't think so. > Does ForeignScan also need equivalent enhancement? I think this depends on the way ForeignScan is supposed to be parallelized, basically if it needs to coordinate any information with other set of workers, then it will require such an enhancement. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] CustomScan under the Gather node?
Hello, What enhancement will be necessary to implement similar feature of partial seq-scan using custom-scan interface? It seems to me callbacks on the three points below are needed. * ExecParallelEstimate * ExecParallelInitializeDSM * ExecParallelInitializeWorker Anything else? Does ForeignScan also need equivalent enhancement? Background of my motivation is the slides below: http://www.slideshare.net/kaigai/sqlgpussd-english (LT slides in JPUG conference last Dec) I'm under investigation of SSD-to-GPU direct feature on top of the custom-scan interface. It intends to load a bunch of data blocks on NVMe-SSD to GPU RAM using peer-to-peer DMA, prior to data loading onto CPU/RAM. (Probably, it shall be loaded only all-visible blocks like as index-only scan.) Once we load the data blocks onto GPU RAM, we can reduce rows to be filtered out later but consumes CPU RAM. An expected major bottleneck is CPU thread which issues the peer-to-peer DMA requests to the device, rather than GPU tasks. So, utilization of parallel execution is a natural thought. However, a CustomScan node that takes underlying PartialSeqScan node is not sufficient because it once loads the data blocks onto CPU RAM. P2P DMA does not make sense. The expected "GpuSsdScan" on CustomScan will reference a shared block-index to be incremented by multiple backend, then it enqueues P2P DMA request (if all visible) to the device driver. Then it receives the rows only visible towards the scan qualifiers. It is almost equivalent to SeqScan, but wants to bypass heap layer to utilize SSD-to-GPU direct data translation path. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers