Re: Bulk Insert into PostgreSQL

2018-06-27 Thread Pavel Stehule
2018-06-27 13:18 GMT+02:00 Srinivas Karthik V :

> Hi,
> I am performing a bulk insert of 1TB TPC-DS benchmark data into PostgreSQL
> 9.4. It's taking around two days to insert 100 GB of data. Please let me
> know your suggestions to improve the performance. Below are the
> configuration parameters I am using:
> shared_buffers =12GB
> maintainence_work_mem = 8GB
> work_mem = 1GB
> fsync = off
> synchronous_commit = off
> checkpoint_segments = 256
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9
> checkpoint_warning = 0
> autovaccum = off
> Other parameters are set to default value. Moreover, I have specified the
> primary key constraint during table creation. This is the only possible
> index being created before data loading and I am sure there are no other
> indexes apart from the primary key column(s).
>

The main factor is using COPY instead INSERTs.

load 100GB database should to get about few hours, not two days.

Regards

Pavel


> Regards,
> Srinivas Karthik
>
>
>


RE: Bulk Insert into PostgreSQL

2018-06-27 Thread ROS Didier
Hi
   I suggest to split the data to insert into several text files ( 
the number of CPUs) , create extension pg_background, and  create a main 
transaction which calls x (number of CPUs) autonomous transactions.
   Each one insert the data from a specific test file via the COPY 
command.
   NB : autonomous transaction can commit
   It would normally divide the duration of the import by the 
number of CPUs.

Best Regards
[cid:image002.png@01D14E0E.8515EB90]


Didier ROS
Expertise SGBD
DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD
Nanterre Picasso - E2 565D (aile nord-est)
32 Avenue Pablo Picasso
92000 Nanterre
didier@edf.fr




De : skarthikv.i...@gmail.com [mailto:skarthikv.i...@gmail.com]
Envoyé : mercredi 27 juin 2018 13:19
À : pgsql-hack...@postgresql.org
Objet : Bulk Insert into PostgreSQL

Hi,
I am performing a bulk insert of 1TB TPC-DS benchmark data into PostgreSQL 9.4. 
It's taking around two days to insert 100 GB of data. Please let me know your 
suggestions to improve the performance. Below are the configuration parameters 
I am using:
shared_buffers =12GB
maintainence_work_mem = 8GB
work_mem = 1GB
fsync = off
synchronous_commit = off
checkpoint_segments = 256
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 0
autovaccum = off
Other parameters are set to default value. Moreover, I have specified the 
primary key constraint during table creation. This is the only possible index 
being created before data loading and I am sure there are no other indexes 
apart from the primary key column(s).

Regards,
Srinivas Karthik





Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.


Re: Bulk Insert into PostgreSQL

2018-06-27 Thread Don Seiler
On Wed, Jun 27, 2018 at 6:25 AM, Pavel Stehule 
wrote:

>
>
>> Other parameters are set to default value. Moreover, I have specified the
>> primary key constraint during table creation. This is the only possible
>> index being created before data loading and I am sure there are no other
>> indexes apart from the primary key column(s).
>>
>
When doing initial bulk data loads, I would suggest not applying ANY
constraints or indexes on the table until after the data is loaded.
Especially unique constraints/indexes, those will slow things down A LOT.


>
> The main factor is using COPY instead INSERTs.
>
>
+1 to COPY.


-- 
Don Seiler
www.seiler.us


Re: Bulk Insert into PostgreSQL

2018-06-29 Thread Srinivas Karthik V
I was using copy command to load. Removing the primary key constraint on
the table and then loading it helps a lot. In fact, a 400GB table was
loaded and the primary constraint was added in around 15 hours.  Thanks for
the wonderful suggestions.

Regards,
Srinivas Karthik

On 28 Jun 2018 2:07 a.m., "Don Seiler"  wrote:

> On Wed, Jun 27, 2018 at 6:25 AM, Pavel Stehule 
> wrote:
>
>>
>>
>>> Other parameters are set to default value. Moreover, I have specified
>>> the primary key constraint during table creation. This is the only possible
>>> index being created before data loading and I am sure there are no other
>>> indexes apart from the primary key column(s).
>>>
>>
> When doing initial bulk data loads, I would suggest not applying ANY
> constraints or indexes on the table until after the data is loaded.
> Especially unique constraints/indexes, those will slow things down A LOT.
>
>
>>
>> The main factor is using COPY instead INSERTs.
>>
>>
> +1 to COPY.
>
>
> --
> Don Seiler
> www.seiler.us
>


Re: Bulk Insert into PostgreSQL

2018-06-30 Thread Craig Ringer
On 30 June 2018 at 06:47, Srinivas Karthik V 
wrote:

> I was using copy command to load. Removing the primary key constraint on
> the table and then loading it helps a lot. In fact, a 400GB table was
> loaded and the primary constraint was added in around 15 hours.  Thanks for
> the wonderful suggestions.
>
>
You can also gain a bit by running with wal_level = minimal. On newer
version you can use UNLOGGED tables then convert them to logged, but that
won't be an option for 9.4.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


RE: Bulk Insert into PostgreSQL

2018-07-01 Thread Tsunakawa, Takayuki
From: Srinivas Karthik V [mailto:skarthikv.i...@gmail.com]
> I was using copy command to load. Removing the primary key constraint on
> the table and then loading it helps a lot. In fact, a 400GB table was loaded
> and the primary constraint was added in around 15 hours.  Thanks for the
> wonderful suggestions.

400 GB / 15 hours = 7.6 MB/s

That looks too slow.  I experienced a similar slowness.  While our user tried 
to INSERT (not COPY) a billion record, they reported INSERTs slowed down by 10 
times or so after inserting about 500 million records.  Periodic pstack runs on 
Linux showed that the backend was busy in btree operations.  I didn't pursue 
the cause due to other businesses, but there might be something to be improved.


Regards
Takayuki Tsunakawa





Re: Bulk Insert into PostgreSQL

2018-07-01 Thread Peter Geoghegan
On Sun, Jul 1, 2018 at 5:19 PM, Tsunakawa, Takayuki
 wrote:
> 400 GB / 15 hours = 7.6 MB/s
>
> That looks too slow.  I experienced a similar slowness.  While our user tried 
> to INSERT (not COPY) a billion record, they reported INSERTs slowed down by 
> 10 times or so after inserting about 500 million records.  Periodic pstack 
> runs on Linux showed that the backend was busy in btree operations.  I didn't 
> pursue the cause due to other businesses, but there might be something to be 
> improved.

What kind of data was indexed? Was it a bigserial primary key, or
something else?

-- 
Peter Geoghegan



RE: Bulk Insert into PostgreSQL

2018-07-01 Thread Tsunakawa, Takayuki
From: Peter Geoghegan [mailto:p...@bowt.ie]
> What kind of data was indexed? Was it a bigserial primary key, or
> something else?

Sorry, I don't remember it.  But the table was for storing some machine usage 
data, and I don't think any sequence was used in the index.

According to my faint memory, iostat showed many reads on the database storage, 
and correspondingly pstack showed ReadBufferExtended during the btree 
operations.  shared_buffers was multiple GBs.  I wondered why btree operations 
didn't benefit from the caching of non-leaf nodes.

Regards
Takayuki Tsunakawa





Re: Bulk Insert into PostgreSQL

2018-07-03 Thread Srinivas Karthik V
@Peter: I was indexing the primary key of all the tables in tpc-ds. Some of
the fact tables has multiple columns as part of the primary key. Also, most
of them are numeric type.

On Mon, Jul 2, 2018 at 7:09 AM, Peter Geoghegan  wrote:

> On Sun, Jul 1, 2018 at 5:19 PM, Tsunakawa, Takayuki
>  wrote:
> > 400 GB / 15 hours = 7.6 MB/s
> >
> > That looks too slow.  I experienced a similar slowness.  While our user
> tried to INSERT (not COPY) a billion record, they reported INSERTs slowed
> down by 10 times or so after inserting about 500 million records.  Periodic
> pstack runs on Linux showed that the backend was busy in btree operations.
> I didn't pursue the cause due to other businesses, but there might be
> something to be improved.
>
> What kind of data was indexed? Was it a bigserial primary key, or
> something else?
>
> --
> Peter Geoghegan
>


Re: Bulk Insert into PostgreSQL

2018-07-03 Thread Ashwin Agrawal
On Sat, Jun 30, 2018 at 6:27 AM Craig Ringer  wrote:

>
> You can also gain a bit by running with wal_level = minimal. On newer
> version you can use UNLOGGED tables then convert them to logged, but that
> won't be an option for 9.4.
>

Curious to know more on this does with standby also its faster or only
without standby this option can be faster ?


Re: Bulk Insert into PostgreSQL

2018-07-04 Thread Peter Geoghegan
On Tue, Jul 3, 2018 at 4:34 PM, Srinivas Karthik V
 wrote:
> @Peter: I was indexing the primary key of all the tables in tpc-ds. Some of
> the fact tables has multiple columns as part of the primary key. Also, most
> of them are numeric type.

Please see my mail to -hackers on suffix truncation:
https://postgr.es/m/CAH2-Wzn5XbCzk6u0GL+uPnCp1tbrp2pJHJ=3byt4yq0_zzh...@mail.gmail.com

Perhaps this is related in some way, since in both cases we're talking
about a composite index on varlena-type columns, where the types have
expensive comparisons.

-- 
Peter Geoghegan



Re: Bulk Insert into PostgreSQL

2018-07-05 Thread Srinivas Karthik V
Thanks for the link!

Alternatively, when I am trying to create an index on a column of a table
which is of size 400 GB, it is taking roughly 7 hrs. The index is created
only on one column which is not a primary key. The query I am using is,
create index on table (colname). I request your valuable suggestions for
the same. The configuration parameters are:

shared_buffers =12GB
maintainence_work_mem = 8GB
work_mem = 1GB
fsync = off
synchronous_commit = off
checkpoint_segments = 256
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 0
autovaccum = off

Regards,
Srinivas Karthik

On Wed, Jul 4, 2018 at 10:27 PM, Peter Geoghegan  wrote:

> On Tue, Jul 3, 2018 at 4:34 PM, Srinivas Karthik V
>  wrote:
> > @Peter: I was indexing the primary key of all the tables in tpc-ds. Some
> of
> > the fact tables has multiple columns as part of the primary key. Also,
> most
> > of them are numeric type.
>
> Please see my mail to -hackers on suffix truncation:
> https://postgr.es/m/CAH2-Wzn5XbCzk6u0GL+uPnCp1tbrp2pJHJ=3bYT4yQ0_
> zzh...@mail.gmail.com
>
> Perhaps this is related in some way, since in both cases we're talking
> about a composite index on varlena-type columns, where the types have
> expensive comparisons.
>
> --
> Peter Geoghegan
>


Re: Bulk Insert into PostgreSQL

2018-06-27 Thread Pavel Stehule
2018-06-27 13:18 GMT+02:00 Srinivas Karthik V :

> Hi,
> I am performing a bulk insert of 1TB TPC-DS benchmark data into PostgreSQL
> 9.4. It's taking around two days to insert 100 GB of data. Please let me
> know your suggestions to improve the performance. Below are the
> configuration parameters I am using:
> shared_buffers =12GB
> maintainence_work_mem = 8GB
> work_mem = 1GB
> fsync = off
> synchronous_commit = off
> checkpoint_segments = 256
> checkpoint_timeout = 1h
> checkpoint_completion_target = 0.9
> checkpoint_warning = 0
> autovaccum = off
> Other parameters are set to default value. Moreover, I have specified the
> primary key constraint during table creation. This is the only possible
> index being created before data loading and I am sure there are no other
> indexes apart from the primary key column(s).
>

The main factor is using COPY instead INSERTs.

load 100GB database should to get about few hours, not two days.

Regards

Pavel


> Regards,
> Srinivas Karthik
>
>
>


RE: Bulk Insert into PostgreSQL

2018-06-27 Thread ROS Didier
Hi
   I suggest to split the data to insert into several text files ( 
the number of CPUs) , create extension pg_background, and  create a main 
transaction which calls x (number of CPUs) autonomous transactions.
   Each one insert the data from a specific test file via the COPY 
command.
   NB : autonomous transaction can commit
   It would normally divide the duration of the import by the 
number of CPUs.

Best Regards
[cid:image002.png@01D14E0E.8515EB90]


Didier ROS
Expertise SGBD
DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD
Nanterre Picasso - E2 565D (aile nord-est)
32 Avenue Pablo Picasso
92000 Nanterre
didier@edf.fr




De : skarthikv.i...@gmail.com [mailto:skarthikv.i...@gmail.com]
Envoyé : mercredi 27 juin 2018 13:19
À : pgsql-hack...@postgresql.org
Objet : Bulk Insert into PostgreSQL

Hi,
I am performing a bulk insert of 1TB TPC-DS benchmark data into PostgreSQL 9.4. 
It's taking around two days to insert 100 GB of data. Please let me know your 
suggestions to improve the performance. Below are the configuration parameters 
I am using:
shared_buffers =12GB
maintainence_work_mem = 8GB
work_mem = 1GB
fsync = off
synchronous_commit = off
checkpoint_segments = 256
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 0
autovaccum = off
Other parameters are set to default value. Moreover, I have specified the 
primary key constraint during table creation. This is the only possible index 
being created before data loading and I am sure there are no other indexes 
apart from the primary key column(s).

Regards,
Srinivas Karthik





Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.


Re: Bulk Insert into PostgreSQL

2018-06-27 Thread Don Seiler
On Wed, Jun 27, 2018 at 6:25 AM, Pavel Stehule 
wrote:

>
>
>> Other parameters are set to default value. Moreover, I have specified the
>> primary key constraint during table creation. This is the only possible
>> index being created before data loading and I am sure there are no other
>> indexes apart from the primary key column(s).
>>
>
When doing initial bulk data loads, I would suggest not applying ANY
constraints or indexes on the table until after the data is loaded.
Especially unique constraints/indexes, those will slow things down A LOT.


>
> The main factor is using COPY instead INSERTs.
>
>
+1 to COPY.


-- 
Don Seiler
www.seiler.us


Re: Bulk Insert into PostgreSQL

2018-06-29 Thread Srinivas Karthik V
I was using copy command to load. Removing the primary key constraint on
the table and then loading it helps a lot. In fact, a 400GB table was
loaded and the primary constraint was added in around 15 hours.  Thanks for
the wonderful suggestions.

Regards,
Srinivas Karthik

On 28 Jun 2018 2:07 a.m., "Don Seiler"  wrote:

> On Wed, Jun 27, 2018 at 6:25 AM, Pavel Stehule 
> wrote:
>
>>
>>
>>> Other parameters are set to default value. Moreover, I have specified
>>> the primary key constraint during table creation. This is the only possible
>>> index being created before data loading and I am sure there are no other
>>> indexes apart from the primary key column(s).
>>>
>>
> When doing initial bulk data loads, I would suggest not applying ANY
> constraints or indexes on the table until after the data is loaded.
> Especially unique constraints/indexes, those will slow things down A LOT.
>
>
>>
>> The main factor is using COPY instead INSERTs.
>>
>>
> +1 to COPY.
>
>
> --
> Don Seiler
> www.seiler.us
>


Re: Bulk Insert into PostgreSQL

2018-06-30 Thread Craig Ringer
On 30 June 2018 at 06:47, Srinivas Karthik V 
wrote:

> I was using copy command to load. Removing the primary key constraint on
> the table and then loading it helps a lot. In fact, a 400GB table was
> loaded and the primary constraint was added in around 15 hours.  Thanks for
> the wonderful suggestions.
>
>
You can also gain a bit by running with wal_level = minimal. On newer
version you can use UNLOGGED tables then convert them to logged, but that
won't be an option for 9.4.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


RE: Bulk Insert into PostgreSQL

2018-07-01 Thread Tsunakawa, Takayuki
From: Srinivas Karthik V [mailto:skarthikv.i...@gmail.com]
> I was using copy command to load. Removing the primary key constraint on
> the table and then loading it helps a lot. In fact, a 400GB table was loaded
> and the primary constraint was added in around 15 hours.  Thanks for the
> wonderful suggestions.

400 GB / 15 hours = 7.6 MB/s

That looks too slow.  I experienced a similar slowness.  While our user tried 
to INSERT (not COPY) a billion record, they reported INSERTs slowed down by 10 
times or so after inserting about 500 million records.  Periodic pstack runs on 
Linux showed that the backend was busy in btree operations.  I didn't pursue 
the cause due to other businesses, but there might be something to be improved.


Regards
Takayuki Tsunakawa





Re: Bulk Insert into PostgreSQL

2018-07-01 Thread Peter Geoghegan
On Sun, Jul 1, 2018 at 5:19 PM, Tsunakawa, Takayuki
 wrote:
> 400 GB / 15 hours = 7.6 MB/s
>
> That looks too slow.  I experienced a similar slowness.  While our user tried 
> to INSERT (not COPY) a billion record, they reported INSERTs slowed down by 
> 10 times or so after inserting about 500 million records.  Periodic pstack 
> runs on Linux showed that the backend was busy in btree operations.  I didn't 
> pursue the cause due to other businesses, but there might be something to be 
> improved.

What kind of data was indexed? Was it a bigserial primary key, or
something else?

-- 
Peter Geoghegan



RE: Bulk Insert into PostgreSQL

2018-07-01 Thread Tsunakawa, Takayuki
From: Peter Geoghegan [mailto:p...@bowt.ie]
> What kind of data was indexed? Was it a bigserial primary key, or
> something else?

Sorry, I don't remember it.  But the table was for storing some machine usage 
data, and I don't think any sequence was used in the index.

According to my faint memory, iostat showed many reads on the database storage, 
and correspondingly pstack showed ReadBufferExtended during the btree 
operations.  shared_buffers was multiple GBs.  I wondered why btree operations 
didn't benefit from the caching of non-leaf nodes.

Regards
Takayuki Tsunakawa





Re: Bulk Insert into PostgreSQL

2018-07-03 Thread Srinivas Karthik V
@Peter: I was indexing the primary key of all the tables in tpc-ds. Some of
the fact tables has multiple columns as part of the primary key. Also, most
of them are numeric type.

On Mon, Jul 2, 2018 at 7:09 AM, Peter Geoghegan  wrote:

> On Sun, Jul 1, 2018 at 5:19 PM, Tsunakawa, Takayuki
>  wrote:
> > 400 GB / 15 hours = 7.6 MB/s
> >
> > That looks too slow.  I experienced a similar slowness.  While our user
> tried to INSERT (not COPY) a billion record, they reported INSERTs slowed
> down by 10 times or so after inserting about 500 million records.  Periodic
> pstack runs on Linux showed that the backend was busy in btree operations.
> I didn't pursue the cause due to other businesses, but there might be
> something to be improved.
>
> What kind of data was indexed? Was it a bigserial primary key, or
> something else?
>
> --
> Peter Geoghegan
>


Re: Bulk Insert into PostgreSQL

2018-07-03 Thread Ashwin Agrawal
On Sat, Jun 30, 2018 at 6:27 AM Craig Ringer  wrote:

>
> You can also gain a bit by running with wal_level = minimal. On newer
> version you can use UNLOGGED tables then convert them to logged, but that
> won't be an option for 9.4.
>

Curious to know more on this does with standby also its faster or only
without standby this option can be faster ?


Re: Bulk Insert into PostgreSQL

2018-07-04 Thread Peter Geoghegan
On Tue, Jul 3, 2018 at 4:34 PM, Srinivas Karthik V
 wrote:
> @Peter: I was indexing the primary key of all the tables in tpc-ds. Some of
> the fact tables has multiple columns as part of the primary key. Also, most
> of them are numeric type.

Please see my mail to -hackers on suffix truncation:
https://postgr.es/m/CAH2-Wzn5XbCzk6u0GL+uPnCp1tbrp2pJHJ=3byt4yq0_zzh...@mail.gmail.com

Perhaps this is related in some way, since in both cases we're talking
about a composite index on varlena-type columns, where the types have
expensive comparisons.

-- 
Peter Geoghegan



Re: Bulk Insert into PostgreSQL

2018-07-05 Thread Srinivas Karthik V
Thanks for the link!

Alternatively, when I am trying to create an index on a column of a table
which is of size 400 GB, it is taking roughly 7 hrs. The index is created
only on one column which is not a primary key. The query I am using is,
create index on table (colname). I request your valuable suggestions for
the same. The configuration parameters are:

shared_buffers =12GB
maintainence_work_mem = 8GB
work_mem = 1GB
fsync = off
synchronous_commit = off
checkpoint_segments = 256
checkpoint_timeout = 1h
checkpoint_completion_target = 0.9
checkpoint_warning = 0
autovaccum = off

Regards,
Srinivas Karthik

On Wed, Jul 4, 2018 at 10:27 PM, Peter Geoghegan  wrote:

> On Tue, Jul 3, 2018 at 4:34 PM, Srinivas Karthik V
>  wrote:
> > @Peter: I was indexing the primary key of all the tables in tpc-ds. Some
> of
> > the fact tables has multiple columns as part of the primary key. Also,
> most
> > of them are numeric type.
>
> Please see my mail to -hackers on suffix truncation:
> https://postgr.es/m/CAH2-Wzn5XbCzk6u0GL+uPnCp1tbrp2pJHJ=3bYT4yQ0_
> zzh...@mail.gmail.com
>
> Perhaps this is related in some way, since in both cases we're talking
> about a composite index on varlena-type columns, where the types have
> expensive comparisons.
>
> --
> Peter Geoghegan
>