Re: [HACKERS] generic copy options

Emmanuel Cecchet Thu, 17 Sep 2009 16:31:26 -0700

On that particular patch, as Robert mentioned, only the parsing has changed.

In the case of \copy, the parsing is much lighter than before in psql(remains the same in the server). The bigger the COPY operation the lessyou will see the impact of the parsing since it is done only once forthe entire operation.


Emmanuel

Dan Colish wrote:

On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote:
Greg Smith wrote:
On Thu, 17 Sep 2009, Dan Colish wrote:
- Performance appears to be the same although I don't have a goodway for
      testing this at the moment
Here's what I do to generate simple COPY performance test cases:

CREATE TABLE t (i integer);
INSERT INTO t SELECT x FROM generate_series(1,100000) AS x;
\timing
COPY t TO '/some/file' WITH [options];
BEGIN;
TRUNCATE TABLE t;
COPY t FROM '/some/file' WITH [options];
COMMIT;
You can adjust the size of the generated table based on whether youwant to minimize (small number) or maximize (big number) the impact ofthe setup overhead relative to actual processing time. Big numbersmake sense if there's a per-row change, small ones if it's mainly COPYsetup that's been changed if you want a small bit of data to testagainst.
An example with one column in it is a good test case for seeingwhether per-row impact has gone up. You'd want something with a widerrow for other types of performance tests.
The reason for the BEGIN/COMMIT there is that form utilizes anoptimization that lowers WAL volume when doing the COPY insertion,which makes it more likely you'll be testing performance of the rightthing.
I usually prefer to test with a table that is more varied than anythingyou can make with generate_series. When I tested my ragged copy patchthe other day I copied 1,000,000 rows out of a large table with amixture of dates, strings, numbers and nulls.
But then, it has a (tiny) per field overhead so I wanted to make surethat was well represented in the test.
You are certainly right about wrapping it in begin/truncate/commit (andwhen you do make sure that archiving is not on).
You probably want to make sure that the file is not on the same disk asthe database, to avoid disk contention. Or, better, make sure that it isin OS file system cache, or on a RAM disk.
cheers

andrew
If someone with a more significant setup can run tests that would ideal.
I only have my laptop which is a single disk and fairly underpowered.

That said, here are my results running the script above, it looks like
the pach improves performance. I would really interested to see results
on a larger data set and heavier iron.

--
--Dan

Without Patch:

        CREATE TABLE
        INSERT 0 100000
        Timing is on.
        COPY 100000
        Time: 83.273 ms
        BEGIN
        Time: 0.412 ms
        TRUNCATE TABLE
        Time: 0.357 ms
        COPY 100000
        Time: 140.911 ms
        COMMIT
        Time: 4.909 ms

With Patch:

        CREATE TABLE
        INSERT 0 100000
        Timing is on.
        COPY 100000
        Time: 80.205 ms
        BEGIN
        Time: 0.351 ms
        TRUNCATE TABLE
        Time: 0.346 ms
        COPY 100000
        Time: 124.303 ms
        COMMIT
        Time: 4.130 ms



--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic copy options

Reply via email to