Re: Parallel copy

Amit Kapila Wed, 26 Feb 2020 02:54:46 -0800

On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>
> On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
> >Hi,
> >
> >> The one piece of information I'm missing here is at least a very rough
> >> quantification of the individual steps of CSV processing - for example
> >> if parsing takes only 10% of the time, it's pretty pointless to start by
> >> parallelising this part and we should focus on the rest. If it's 50% it
> >> might be a different story. Has anyone done any measurements?
> >
> >Not recently, but I'm pretty sure that I've observed CSV parsing to be
> >way more than 10%.
> >
>
> Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
> so I still think we need to do some measurements first.
>


Agreed.

> I'm willing to
> do that, but (a) I doubt I'll have time for that until after 2020-03,
> and (b) it'd be good to agree on some set of typical CSV files.
>

Right, I don't know what is the best way to define that.  I can think
of the below tests.

1. A table with 10 columns (with datatypes as integers, date, text).
It has one index (unique/primary). Load with 1 million rows (basically
the data should be probably 5-10 GB).
2. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). Load with 1
million rows (basically the data should be probably 5-10 GB).
3. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). It has before
and after trigeers. Load with 1 million rows (basically the data
should be probably 5-10 GB).
4. A table with 10 columns (with datatypes as integers, date, text).
It has five or six indexes, one index can be (unique/primary). Load
with 1 million rows (basically the data should be probably 5-10 GB).

Among all these tests, we can check how much time did we spend in
reading, parsing the csv files vs. rest of execution?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

Reply via email to