On Mon, 10 Jan 2011 19:26:11 -0500
Tom Lane <t...@sss.pgh.pa.us> wrote:
> Shigeru HANADA <han...@metrosystems.co.jp> writes:
> > For the purpose of file_fdw, additional ResetCopyFrom() would be
> > necessary. I'm planning to include such changes in file_fdw patch. 
> > Please find attached partial patch for ResetCopyFrom(). Is there
> > anything else which should be done at reset?
> 
> Seems like it would be smarter to close and re-open the copy operation.
> Adding a reset function is just creating an additional maintenance
> burden and point of failure, for what seems likely to be a negligible
> performance benefit.

Agreed. fileReScan can be implemented with close/re-open with storing
some additional information into FDW private area. I would withdraw
the proposal.

> If you think it's not negligible, please show some proof of that before
> asking us to support such code.

Anyway, I've measured overhead of re-open with executing query
including inner join between foreign tables copied from pgbench schema. 
I used SELECT statement below:

    EXPLAIN (ANALYZE) SELECT count(*) FROM csv_accounts a JOIN
    csv_branches b ON (b.bid = a.bid);

On the average of (Nested Loop - (Foreign Scan * 2)), overhead of
re-open is round 0.048ms per tuple (average of 3 times measurement).

After the implementation of file_fdw, I'm going to measure again. If
ResetCopyFrom significantly improves performance of ReScan, I'll
propose it as a separate patch.

=========================================================================

The results of EXPLAIN ANALYZE are:

[using ResetCopyFrom]
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11717.02..11717.03 rows=1 width=0) (actual 
time=73357.655..73357.657 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11717.01 rows=1 width=0) (actual 
time=0.209..71424.059 rows=1000000 loops=1)
         ->  Foreign Scan on public.csv_accounts a  (cost=0.00..11717.00 rows=1 
width=4) (actual time=0.144..6998.497 rows=1000000 loops=1)
         ->  Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 
width=4) (actual time=0.008..0.037 rows=10 loops=1000000)
 Total runtime: 73358.135 ms
(11 rows)

[using EndCopyFrom + BeginCopyFrom]
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11717.02..11717.03 rows=1 width=0) (actual 
time=120724.138..120724.140 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11717.01 rows=1 width=0) (actual 
time=0.321..118583.681 rows=1000000 loops=1)
         ->  Foreign Scan on public.csv_accounts a  (cost=0.00..11717.00 rows=1 
width=4) (actual time=0.156..7208.968 rows=1000000 loops=1)
         ->  Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 
width=4) (actual time=0.016..0.046 rows=10 loops=1000000)
 Total runtime: 121118.792 ms
(11 rows)

Time: 121122.205 ms

=========================================================================

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to