Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-12 Thread Julio Cesar Sánchez González

El jue, 08-11-2007 a las 13:01 -0500, Tom Lane escribió:
> "Krasimir Hristozov \(InterMedia Ltd\)" <[EMAIL PROTECTED]> writes:
> > We need to import data from a relatively large MySQL database into an
> > existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
> > and INSERTs in PostgreSQL. A part of the import involves moving about
> > 1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
> > problem is that the insert performance inevitably deteriorates as the number
> > of inserts increases.
> 
> Are you *certain* you've gotten rid of all the indexes and foreign keys?
> A simple insert ought to be pretty much constant-time in Postgres, so it
> seems to me that you've missed something.
> 
> It also seems possible that you are wrong to disregard PHP as a possible
> source of the problem.  Have you tried watching the PHP and PG backend
> processes with "top" (or similar tool) to see who's consuming CPU time
> and/or memory space?
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings


Hi for all,

Why better try use PERL DBI, may be better than.

-- 
Regards,

Julio Cesar Sánchez González.

--
Ahora me he convertido en la muerte, destructora de mundos.
Soy la Muerte que se lleva todo, la fuente de las cosas que vendran.

www.sistemasyconectividad.com.mxhttp://darkavngr.blogspot.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-09 Thread Tomas Vondra

Try to one of these:

a) don't use INSERT statements, use a COPY instead

b) from time to time run ANALYZE on the "public" table (say 1000
   inserts, then one analyze)

c) create the table without constraints (primary / foreign keys in this
   case), import all the data, and then create the constraints

The (b) and (c) may be combined, i.e. import without constraints and
analyze from time to time. I'd probably try the (a) at first, anyway.

Try to gather some more statistics - is the problem related to CPU or
I/O? Use 'dstat' for example - this might say give you a hint in case
the advices mentioned above don't help.


I agree with all but b).


You're right - this combination (no constraints or indices and using 
ANALYZE at the same time) won't help. The original list of advices was a 
little bit different, but I've changed it and haven't checked the 
following paragraphs ...


Tomas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-09 Thread Krasimir Hristozov (InterMedia Ltd)
Thanks to all who responded. Using COPY instead of INSERT really solved the 
problem - the whole process took about 1h 20min on an indexed table, with 
constraints (which is close to our initial expectations). We're performing some 
additional tests now. I'll post some more observations when finished.
  - Original Message - 
  From: Márcio Geovani Jasinski 
  To: pgsql-general@postgresql.org 
  Sent: Friday, November 09, 2007 1:52 PM
  Subject: Re: INSERT performance deteriorates quickly during a large import


  Hello Krasimir,

  You got a lot of good advices above and I would like to add another one:

  d) Make sure of your PHP code is not recursive. As you said the memory is 
stable so I think your method is iterative. 
  A recursive method certainly will increase a little time for each insert 
using more memory.
  But iterative methods must be correctly to be called just once and maybe your 
code is running much more than need.

  Pay attention on Tomas advices, and after that (I agree with Cris) "there 
should be no reason for loading data to get more costly as
  the size of the table increases" - Please check your code.

  I did some experiences long time ago with 4 data with a lot of BLOBs. I 
used PHP code using SELECT/INSERT from Postgres to Postgres and the time wasn't 
constant but wasn't so bad as your case.  (And I didn't the Tomas a, b and c 
advices) 

  Good Luck
  -- 
  Márcio Geovani Jasinski 

Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-09 Thread Márcio Geovani Jasinski
Hello Krasimir,

You got a lot of good advices above and I would like to add another one:

d) Make sure of your PHP code is not recursive. As you said the memory is
stable so I think your method is iterative.
A recursive method certainly will increase a little time for each insert
using more memory.
But iterative methods must be correctly to be called just once and maybe
your code is running much more than need.

Pay attention on Tomas advices, and after that (I agree with Cris) "there
should be no reason for loading data to get more costly as
the size of the table increases" - Please check your code.

I did some experiences long time ago with 4 data with a lot of BLOBs. I
used PHP code using SELECT/INSERT from Postgres to Postgres and the time
wasn't constant but wasn't so bad as your case.  (And I didn't the Tomas a,
b and c advices)

Good Luck
-- 
Márcio Geovani Jasinski


Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Chris Browne
[EMAIL PROTECTED] ("=?UTF-8?B?VG9tw6HFoSBWb25kcmE=?=") writes:
> Try to one of these:
>
> a) don't use INSERT statements, use a COPY instead
>
> b) from time to time run ANALYZE on the "public" table (say 1000
>inserts, then one analyze)
>
> c) create the table without constraints (primary / foreign keys in this
>case), import all the data, and then create the constraints
>
> The (b) and (c) may be combined, i.e. import without constraints and
> analyze from time to time. I'd probably try the (a) at first, anyway.
>
> Try to gather some more statistics - is the problem related to CPU or
> I/O? Use 'dstat' for example - this might say give you a hint in case
> the advices mentioned above don't help.

I agree with all but b).

- If you use COPY, that copies data "in bulk" which is *way* faster
  than submitting individual INSERT statements that must each be parsed.

  So I certainly agree with a).

- There are two prime reasons to expect the data load to slow down:

  1. Because adding entries to the index gets more expensive the
 larger the table gets;

  2. Because searching through foreign key constraints tends to get
 more expensive as the target table grows.

  Those point to doing c).

If you put off evaluating indices and foreign key constraints until
all of the data is loaded, there should be no need to run ANALYZE
during the COPY process.

And there should be no reason for loading data to get more costly as
the size of the table increases.
-- 
let name="cbbrowne" and tld="linuxfinances.info" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/advocacy.html
Rules of the Evil Overlord #116.  "If I capture the hero's starship, I
will keep it in  the landing bay with the ramp down,  only a few token
guards on  duty and a ton  of explosives set to  go off as  soon as it
clears the blast-range." 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Bill Moran
In response to "Krasimir Hristozov \(InterMedia Ltd\)" <[EMAIL PROTECTED]>:

> We need to import data from a relatively large MySQL database into an
> existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
> and INSERTs in PostgreSQL. A part of the import involves moving about
> 1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
> problem is that the insert performance inevitably deteriorates as the number
> of inserts increases.
> 
> We tried different approaches:
> 
>  * selecting only parts of the source table data based on a certain
> condition
>  * selecting all of the source data
> 
> coupled with either of these:
> 
>  * inserting without explicit transactions
>  * inserting all the data in a single transaction
>  * inserting the data in partial transactions of about 100, 1000, 1,
> 10 inserts each
> 
> While there were performance benefits in some of the cases (selecting all
> the data and inserting in transaction packets of about 1000 each being the
> fastest), the problem was that it still deteriorated as the import
> progressed.
> 
> We tried removing all foreign keys and indices from the postgres table,
> still gained performance, but it deteriorated as well.
> 
> The latest (and best performing) test we did was under the following
> conditions:
> 
>  * 11851 pre-existing records in the destination table
>  * the table was vacuumed just before the import
>  * all foreign keys and indices were removed from the destination table
>  * selected all of the data from the source table at once
>  * inserted in transactions of 1000 inserts each
> 
> We displayed time statistics on each 100 inserts. The process started at
> about 1 second per 100 inserts. This estimated to about 4 hours for the
> entire process. 14 hours later it had imported about a quarter of the data
> (a bit more than 33 records), and 100 inserts now took nearly 40
> seconds.
> 
> We tested reading from MySQL alone, without inserting the data in Postgres.
> All records were read in about a minute and a half, so MySQL performance is
> not a part of the issue. The PHP script selects the MySQL data, fetches rows
> sequentially, occasionally performs a couple of selects against PostgreSQL
> data (which is cached in a PHP array to reduce the DB operations; no more
> than 8 array elements, integer keys, integer data), and inserts into
> PostgreSQL. The algorithm seems to be linear in nature and perfomance
> deterioration most probably doesn't have to do with the PHP code.
> 
> Has anyone had an issue like this, and perhaps a suggestion for a possible
> cause and solution? Is it common for PostgreSQL to grow so slow as the
> amount of data in the tables increases? If so, is it just the insert
> operation or all kinds of queries? Isn't 30 records too low a threshold
> for such performance deterioration?
> 
> Here are some technical details, that might be helpful:
> 
>  * PHP, MySQL and PostgreSQL all work on the same server, sharing the same
> memory and hard drive.

This makes it very difficult to blame PostgreSQL.  If the insert process
is CPU bound, and PHP is using a ton of CPU, then PG will be starved.
You kinda contradict yourself, saying PG is not starved, then saying
that the CPU is maxed out.  In any event, having all three on one machine
will make it more fun to isolate where the actual bottleneck is.

>  * the server runs FreeBSD 5.3-RELEASE on an AMD Athlon(tm) 64 Processor
> 3000+ (2GHz K8 class CPU) with 1GB RAM

This is another problem.  5.3 has the worst performance of any version
of FreeBSD I've ever used.  Even downgrading to 4.11 (not recommended)
would produce a performance improvement, but you you should get this
system to 5.5 (at least) or 6.2 (preferable).

>  * the software versions installed are Apache 1.3.37, PHP 5.2.1, MySQL
> 4.1.22, PostgreSQL 8.1.8
>  * postgresql.conf variables other than defaults are: max_connections = 40,
> shared_buffers = 1000 (this is the default)

1000 shared_buffers is pretty low for any real work.  While your tables
aren't huge, they're big enough to warrant more shared_buffers.  However,
you've only got 1G of RAM on this system to share between two DB servers,
which is going to constrain you a good bit.

There are other settings important to insert performance that you haven't
mentioned.  checkpoint_segments and the like, for example.  You also don't
describe your disk subsystem, it's entirely possible you've filled up the
cache on the disk controllers (which is why it looked initially fast) and
now are hitting up against the max speed the disks can do.  With only 1G
of RAM, it's possible that MySQL is reading, PostgreSQL is writing, and
PHP is swapping.  It doesn't take a lot of disk contention to flush
performance down the toilet.  See what iostat says.  Even better, use
top in "m" mode (hit m after top start) to see how much IO each process
is using (I believe that was added in 5.X, but it may only be available
in 6.X v

Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Tom Lane
"Krasimir Hristozov \(InterMedia Ltd\)" <[EMAIL PROTECTED]> writes:
> We need to import data from a relatively large MySQL database into an
> existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
> and INSERTs in PostgreSQL. A part of the import involves moving about
> 1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
> problem is that the insert performance inevitably deteriorates as the number
> of inserts increases.

Are you *certain* you've gotten rid of all the indexes and foreign keys?
A simple insert ought to be pretty much constant-time in Postgres, so it
seems to me that you've missed something.

It also seems possible that you are wrong to disregard PHP as a possible
source of the problem.  Have you tried watching the PHP and PG backend
processes with "top" (or similar tool) to see who's consuming CPU time
and/or memory space?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Tomáš Vondra

Try to one of these:

a) don't use INSERT statements, use a COPY instead

b) from time to time run ANALYZE on the "public" table (say 1000
   inserts, then one analyze)

c) create the table without constraints (primary / foreign keys in this
   case), import all the data, and then create the constraints

The (b) and (c) may be combined, i.e. import without constraints and 
analyze from time to time. I'd probably try the (a) at first, anyway.


Try to gather some more statistics - is the problem related to CPU or 
I/O? Use 'dstat' for example - this might say give you a hint in case 
the advices mentioned above don't help.


Tomas


We need to import data from a relatively large MySQL database into an
existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
and INSERTs in PostgreSQL. A part of the import involves moving about
1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
problem is that the insert performance inevitably deteriorates as the 
number

of inserts increases.

We tried different approaches:

* selecting only parts of the source table data based on a certain
condition
* selecting all of the source data

coupled with either of these:

* inserting without explicit transactions
* inserting all the data in a single transaction
* inserting the data in partial transactions of about 100, 1000, 1,
10 inserts each

While there were performance benefits in some of the cases (selecting all
the data and inserting in transaction packets of about 1000 each being the
fastest), the problem was that it still deteriorated as the import
progressed.

We tried removing all foreign keys and indices from the postgres table,
still gained performance, but it deteriorated as well.

The latest (and best performing) test we did was under the following
conditions:

* 11851 pre-existing records in the destination table
* the table was vacuumed just before the import
* all foreign keys and indices were removed from the destination table
* selected all of the data from the source table at once
* inserted in transactions of 1000 inserts each

We displayed time statistics on each 100 inserts. The process started at
about 1 second per 100 inserts. This estimated to about 4 hours for the
entire process. 14 hours later it had imported about a quarter of the data
(a bit more than 33 records), and 100 inserts now took nearly 40
seconds.

We tested reading from MySQL alone, without inserting the data in Postgres.
All records were read in about a minute and a half, so MySQL performance is
not a part of the issue. The PHP script selects the MySQL data, fetches 
rows

sequentially, occasionally performs a couple of selects against PostgreSQL
data (which is cached in a PHP array to reduce the DB operations; no more
than 8 array elements, integer keys, integer data), and inserts into
PostgreSQL. The algorithm seems to be linear in nature and perfomance
deterioration most probably doesn't have to do with the PHP code.

Has anyone had an issue like this, and perhaps a suggestion for a possible
cause and solution? Is it common for PostgreSQL to grow so slow as the
amount of data in the tables increases? If so, is it just the insert
operation or all kinds of queries? Isn't 30 records too low a threshold
for such performance deterioration?

Here are some technical details, that might be helpful:

* PHP, MySQL and PostgreSQL all work on the same server, sharing the same
memory and hard drive.
* the server runs FreeBSD 5.3-RELEASE on an AMD Athlon(tm) 64 Processor
3000+ (2GHz K8 class CPU) with 1GB RAM
* the software versions installed are Apache 1.3.37, PHP 5.2.1, MySQL
4.1.22, PostgreSQL 8.1.8
* postgresql.conf variables other than defaults are: max_connections = 40,
shared_buffers = 1000 (this is the default)
* we have also tried these on another server with Red Hat Enterprise Linux
ES release 4 (Linux 2.6.9-42.0.3.ELsmp) on 2xDual Core AMD Opteron(tm)
Processor 270 (4x2GHz logical CPUs) with 2GB RAM
* both servers run in x86_64 mode, PostgreSQL footprint in memory stays
relatively small, CPU usage maxes out on import, there is no resource
starvation in any way

DDL statement for the creation of the PostgreSQL table in question:

CREATE TABLE "public"."sp_thread_replies" (
  "id" SERIAL,
  "thread_id" INTEGER NOT NULL,
  "body" TEXT NOT NULL,
  "ts_added" INTEGER DEFAULT 0 NOT NULL,
  "user_id" INTEGER NOT NULL,
  "thread_offset" INTEGER DEFAULT 0,
  "approved" SMALLINT DEFAULT 1,
  "title" TEXT,
  "deleted" SMALLINT DEFAULT 0,
  "edit_reason" VARCHAR(255),
  "edit_user_id" INTEGER,
  "edit_time" INTEGER,
  CONSTRAINT "sp_thread_replies_pkey" PRIMARY KEY("id"),
  CONSTRAINT "sp_thread_replies_threads_fk" FOREIGN KEY ("thread_id")
REFERENCES "public"."sp_threads"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
NOT DEFERRABLE,
  CONSTRAINT "sp_thread_replies_users_fk" FOREIGN KEY ("user_id")
REFERENCES "public"."sp_users"("id")
ON DELETE NO ACTION
ON UPDATE

[GENERAL] INSERT performance deteriorates quickly during a large import

2007-11-08 Thread Krasimir Hristozov (InterMedia Ltd)

We need to import data from a relatively large MySQL database into an
existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
and INSERTs in PostgreSQL. A part of the import involves moving about
1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
problem is that the insert performance inevitably deteriorates as the number
of inserts increases.

We tried different approaches:

* selecting only parts of the source table data based on a certain
condition
* selecting all of the source data

coupled with either of these:

* inserting without explicit transactions
* inserting all the data in a single transaction
* inserting the data in partial transactions of about 100, 1000, 1,
10 inserts each

While there were performance benefits in some of the cases (selecting all
the data and inserting in transaction packets of about 1000 each being the
fastest), the problem was that it still deteriorated as the import
progressed.

We tried removing all foreign keys and indices from the postgres table,
still gained performance, but it deteriorated as well.

The latest (and best performing) test we did was under the following
conditions:

* 11851 pre-existing records in the destination table
* the table was vacuumed just before the import
* all foreign keys and indices were removed from the destination table
* selected all of the data from the source table at once
* inserted in transactions of 1000 inserts each

We displayed time statistics on each 100 inserts. The process started at
about 1 second per 100 inserts. This estimated to about 4 hours for the
entire process. 14 hours later it had imported about a quarter of the data
(a bit more than 33 records), and 100 inserts now took nearly 40
seconds.

We tested reading from MySQL alone, without inserting the data in Postgres.
All records were read in about a minute and a half, so MySQL performance is
not a part of the issue. The PHP script selects the MySQL data, fetches rows
sequentially, occasionally performs a couple of selects against PostgreSQL
data (which is cached in a PHP array to reduce the DB operations; no more
than 8 array elements, integer keys, integer data), and inserts into
PostgreSQL. The algorithm seems to be linear in nature and perfomance
deterioration most probably doesn't have to do with the PHP code.

Has anyone had an issue like this, and perhaps a suggestion for a possible
cause and solution? Is it common for PostgreSQL to grow so slow as the
amount of data in the tables increases? If so, is it just the insert
operation or all kinds of queries? Isn't 30 records too low a threshold
for such performance deterioration?

Here are some technical details, that might be helpful:

* PHP, MySQL and PostgreSQL all work on the same server, sharing the same
memory and hard drive.
* the server runs FreeBSD 5.3-RELEASE on an AMD Athlon(tm) 64 Processor
3000+ (2GHz K8 class CPU) with 1GB RAM
* the software versions installed are Apache 1.3.37, PHP 5.2.1, MySQL
4.1.22, PostgreSQL 8.1.8
* postgresql.conf variables other than defaults are: max_connections = 40,
shared_buffers = 1000 (this is the default)
* we have also tried these on another server with Red Hat Enterprise Linux
ES release 4 (Linux 2.6.9-42.0.3.ELsmp) on 2xDual Core AMD Opteron(tm)
Processor 270 (4x2GHz logical CPUs) with 2GB RAM
* both servers run in x86_64 mode, PostgreSQL footprint in memory stays
relatively small, CPU usage maxes out on import, there is no resource
starvation in any way

DDL statement for the creation of the PostgreSQL table in question:

CREATE TABLE "public"."sp_thread_replies" (
  "id" SERIAL,
  "thread_id" INTEGER NOT NULL,
  "body" TEXT NOT NULL,
  "ts_added" INTEGER DEFAULT 0 NOT NULL,
  "user_id" INTEGER NOT NULL,
  "thread_offset" INTEGER DEFAULT 0,
  "approved" SMALLINT DEFAULT 1,
  "title" TEXT,
  "deleted" SMALLINT DEFAULT 0,
  "edit_reason" VARCHAR(255),
  "edit_user_id" INTEGER,
  "edit_time" INTEGER,
  CONSTRAINT "sp_thread_replies_pkey" PRIMARY KEY("id"),
  CONSTRAINT "sp_thread_replies_threads_fk" FOREIGN KEY ("thread_id")
REFERENCES "public"."sp_threads"("id")
ON DELETE CASCADE
ON UPDATE NO ACTION
NOT DEFERRABLE,
  CONSTRAINT "sp_thread_replies_users_fk" FOREIGN KEY ("user_id")
REFERENCES "public"."sp_users"("id")
ON DELETE NO ACTION
ON UPDATE NO ACTION
NOT DEFERRABLE
) WITH OIDS;

The table is a part of a custom forum engine. It stores all thread posts.
It's most often queried with SELECTs and INSERTSs, less often with UPDATEs,
and records are deleted quite seldom in normal operation of the application
(though we may delete records manually from the console from time to time). 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly