Re: [PERFORM] "Vacuum Full Analyze" taking so long

2005-07-26 Thread Jochem van Dieten

Tomeh, Husam wrote:

The other question I have. What would be the proper approach to rebuild
indexes. I re-indexes and then run vacuum/analyze. Should I not use the
re-index approach, and instead, drop the indexes, vacuum the tables, and
then create the indexes, then run analyze on tables and indexes?? 


If you just want to rebuild indexes, just drop and recreate.

However, you are also running a VACUUM FULL, so I presume you 
have deleted a significant number of rows and want to recover the 
space that was in use by them. In that scenario, it is often 
better to CLUSTER the table to force a rebuild. While VACUUM FULL 
moves the tuples around inside the existing file(s), CLUSTER 
simply creates new file(s), moves all the non-deleted tuples 
there and then swaps the old and the new files. There can be a 
significant performance increase in doing so (but you obviously 
need to have some free diskspace).
If you CLUSTER your table it will be ordered by the index you 
specify. There can be a performance increase in doing so, but if 
you don't want to you can also do a no-op ALTER TABLE and change 
a column to a datatype that is the same as it already has. This 
too will force a rewrite of the table but without ordering the 
tuples.


So in short my recommendations:
- to rebuild indexes, just drop and recreate the indexes
- to rebuild everything because there is space that can 
bepermanently reclaimed, drop indexes, cluster or alter the 
table, recreate the indexes and anlyze the table


Jochem

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] COPY insert performance

2005-07-26 Thread Gavin Sherry
Hi Chris,

Have you considered breaking the data into multiple chunks and COPYing
each concurrently?

Also, have you ensured that your table isn't storing OIDs?

On Mon, 25 Jul 2005, Chris Isaacson wrote:

> #---
> 
> # RESOURCE USAGE (except WAL)
> #---
> 
> shared_buffers = 65536  # min 16, at least max_connections*2, 8KB each

shared_buffers that high has been shown to affect performance. Try 12000.

> wal_buffers = 64  # min 4, 8KB each

Increasing wal_buffers can also have an effect on performance.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] COPY insert performance

2005-07-26 Thread Chris Isaacson
I do not have any foreign keys and I need the indexes on during the
insert/copy b/c in production a few queries heavily dependent on the
indexes will be issued.  These queries will be infrequent, but must be
fast when issued.

I am using InnoDB with MySQL which appears to enforce true transaction
support.  (http://dev.mysql.com/doc/mysql/en/innodb-overview.html)  If
not, how is InnoDB "cheating"?

Sorry for the confusion, but pg_xlog is currently on a dedicated drive
(10K SCSI, see below).  Would I realize further gains if I had a third
drive and put the indexes on that drive?  

I've played with the checkpoint_segments.  I noticed an enormous
improvement increasing from the default to 40, but neglible improvement
thereafter.  Do you have a recommendation for a value?

My bg_writer adjustments were a last ditch effort.  I found your advice
correct and realized no gain.  I have not tested under a querying load
which is a good next step.  I had not thought of the comparative
degradation of MySQL vs. PostgreSQL.

Thanks for the tip on the RAM usage by indexes.  I was under the
incorrect assumption that shared_buffers would take care of this.  I'll
increase work_mem to 512MB and rerun my test.  I have 1G of RAM, which
is less than we'll be running in production (likely 2G).

-Original Message-
From: John A Meinel [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 25, 2005 6:09 PM
To: Chris Isaacson; Postgresql Performance
Subject: Re: [PERFORM] COPY insert performance


Chris Isaacson wrote:
> I need COPY via libpqxx to insert millions of rows into two tables.  
> One table has roughly have as many rows and requires half the storage.

> In production, the largest table will grow by ~30M rows/day.  To test 
> the COPY performance I split my transactions into 10,000 rows.  I 
> insert roughly 5000 rows into table A for every 10,000 rows into table

> B.
>
> Table A has one unique index:
>
> "order_main_pk" UNIQUE, btree (cl_ord_id)
>
> Table B has 1 unique index and 2 non-unique indexes:
>
> "order_transition_pk" UNIQUE, btree (collating_seq) 
> "order_transition_ak2" btree (orig_cl_ord_id) "order_transition_ak3" 
> btree (exec_id)

Do you have any foreign key references?
If you are creating a table for the first time (or loading a large
fraction of the data), it is common to drop the indexes and foreign keys
first, and then insert/copy, and then drop them again.

Is InnoDB the backend with referential integrity, and true transaction
support? I believe the default backend does not support either (so it is
"cheating" to give you speed, which may be just fine for your needs,
especially since you are willing to run fsync=false).

I think moving pg_xlog to a dedicated drive (set of drives) could help
your performance. As well as increasing checkpoint_segments.

I don't know if you gain much by changing the bg_writer settings, if you
are streaming everything in at once, you probably want to have it
written out right away. My understanding is that bg_writer settings are
for the case where you have mixed read and writes going on at the same
time, and you want to make sure that the reads have time to execute (ie
the writes are not saturating your IO).

Also, is any of this tested under load? Having a separate process issue
queries while you are loading in data. Traditionally MySQL is faster
with a single process inserting/querying for data, but once you have
multiple processes hitting it at the same time, it's performance
degrades much faster than postgres.

You also seem to be giving MySQL 512M of ram to work with, while only
giving 2M/200M to postgres. (re)creating indexes uses
maintenance_work_mem, but updating indexes could easily use work_mem.
You may be RAM starved.

John
=:->


>
> My testing environment is as follows:
> -Postgresql 8.0.1
> -libpqxx 2.5.0
> -Linux 2.6.11.4-21.7-smp x86_64
> -Dual Opteron 246
> -System disk (postgres data resides on this SCSI disk) -  Seagate
> (ST373453LC) - 15K, 73 GB
> (http://www.seagate.com/cda/products/discsales/marketing/detail/0,1081
> ,549,00.html)
> -2nd logical disk - 10K, 36GB IBM SCSI (IC35L036UCDY10-0) - WAL reside
> on this disk
> -NO RAID
>
> *PostgreSQL*
> Here are the results of copying in 10M rows as fast as possible:
> (10K/transaction)
> Total Time:1129.556 s
> Rows/sec: 9899.922
> Transaction>1.2s225
> Transaction>1.5s 77
> Transaction>2.0s  4
> Max Transaction   2.325s
>
> **MySQL**
> **I ran a similar test with MySQL 4.1.10a (InnoDB) which produced 
> these
> results: (I used MySQL's INSERT INTO x VALUES
> (1,2,3)(4,5,6)(...,...,...) syntax) (10K/transaction)
> Total Time: 860.000 s
> Rows/sec:11627.91
> Transaction>1.2s  0
> Transaction>1.5s  0
> Transaction>2.0s  0
> Max Transaction   1.175s
>
> Considering the configurations shown below, can anyone offer advice to

> close the 15% gap and the much worse variability I'm experiencing.  
> Thanks
>
> My *postgresql.conf* has the foll

Re: [PERFORM] COPY insert performance

2005-07-26 Thread Chris Isaacson
John,

(FYI: got a failed to deliver to [EMAIL PROTECTED])

I do not have any foreign keys and I need the indexes on during the
insert/copy b/c in production a few queries heavily dependent on the
indexes will be issued.  These queries will be infrequent, but must be
fast when issued.

I am using InnoDB with MySQL which appears to enforce true transaction
support.  (http://dev.mysql.com/doc/mysql/en/innodb-overview.html)  If
not, how is InnoDB "cheating"?

Sorry for the confusion, but pg_xlog is currently on a dedicated drive
(10K SCSI, see below).  Would I realize further gains if I had a third
drive and put the indexes on that drive? =20

I've played with the checkpoint_segments.  I noticed an enormous
improvement increasing from the default to 40, but neglible improvement
thereafter.  Do you have a recommendation for a value?

My bg_writer adjustments were a last ditch effort.  I found your advice
correct and realized no gain.  I have not tested under a querying load
which is a good next step.  I had not thought of the comparative
degradation of MySQL vs. PostgreSQL.

Thanks for the tip on the RAM usage by indexes.  I was under the
incorrect assumption that shared_buffers would take care of this.  I'll
increase work_mem to 512MB and rerun my test.  I have 1G of RAM, which
is less than we'll be running in production (likely 2G).

-Chris

-Original Message-
From: John A Meinel [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 25, 2005 6:09 PM
To: Chris Isaacson; Postgresql Performance
Subject: Re: [PERFORM] COPY insert performance


Chris Isaacson wrote:
> I need COPY via libpqxx to insert millions of rows into two tables.  
> One table has roughly have as many rows and requires half the storage.

> In production, the largest table will grow by ~30M rows/day.  To test 
> the COPY performance I split my transactions into 10,000 rows.  I 
> insert roughly 5000 rows into table A for every 10,000 rows into table

> B.
>
> Table A has one unique index:
>
> "order_main_pk" UNIQUE, btree (cl_ord_id)
>
> Table B has 1 unique index and 2 non-unique indexes:
>
> "order_transition_pk" UNIQUE, btree (collating_seq) 
> "order_transition_ak2" btree (orig_cl_ord_id) "order_transition_ak3" 
> btree (exec_id)

Do you have any foreign key references?
If you are creating a table for the first time (or loading a large
fraction of the data), it is common to drop the indexes and foreign keys
first, and then insert/copy, and then drop them again.

Is InnoDB the backend with referential integrity, and true transaction
support? I believe the default backend does not support either (so it is
"cheating" to give you speed, which may be just fine for your needs,
especially since you are willing to run fsync=false).

I think moving pg_xlog to a dedicated drive (set of drives) could help
your performance. As well as increasing checkpoint_segments.

I don't know if you gain much by changing the bg_writer settings, if you
are streaming everything in at once, you probably want to have it
written out right away. My understanding is that bg_writer settings are
for the case where you have mixed read and writes going on at the same
time, and you want to make sure that the reads have time to execute (ie
the writes are not saturating your IO).

Also, is any of this tested under load? Having a separate process issue
queries while you are loading in data. Traditionally MySQL is faster
with a single process inserting/querying for data, but once you have
multiple processes hitting it at the same time, it's performance
degrades much faster than postgres.

You also seem to be giving MySQL 512M of ram to work with, while only
giving 2M/200M to postgres. (re)creating indexes uses
maintenance_work_mem, but updating indexes could easily use work_mem.
You may be RAM starved.

John
=:->


>
> My testing environment is as follows:
> -Postgresql 8.0.1
> -libpqxx 2.5.0
> -Linux 2.6.11.4-21.7-smp x86_64
> -Dual Opteron 246
> -System disk (postgres data resides on this SCSI disk) -  Seagate
> (ST373453LC) - 15K, 73 GB
> (http://www.seagate.com/cda/products/discsales/marketing/detail/0,1081
> ,549,00.html)
> -2nd logical disk - 10K, 36GB IBM SCSI (IC35L036UCDY10-0) - WAL reside
> on this disk
> -NO RAID
>
> *PostgreSQL*
> Here are the results of copying in 10M rows as fast as possible:
> (10K/transaction)
> Total Time:1129.556 s
> Rows/sec: 9899.922
> Transaction>1.2s225
> Transaction>1.5s 77
> Transaction>2.0s  4
> Max Transaction   2.325s
>
> **MySQL**
> **I ran a similar test with MySQL 4.1.10a (InnoDB) which produced 
> these
> results: (I used MySQL's INSERT INTO x VALUES
> (1,2,3)(4,5,6)(...,...,...) syntax) (10K/transaction)
> Total Time: 860.000 s
> Rows/sec:11627.91
> Transaction>1.2s  0
> Transaction>1.5s  0
> Transaction>2.0s  0
> Max Transaction   1.175s
>
> Considering the configurations shown below, can anyone offer advice to

> close the 15% gap and the much worse variabil

Re: [PERFORM] COPY insert performance

2005-07-26 Thread Chris Isaacson
I need the chunks for each table COPYed within the same transaction
which is why I'm not COPYing concurrently via multiple
threads/processes.  I will experiment w/o OID's and decreasing the
shared_buffers and wal_buffers.

Thanks,
Chris

-Original Message-
From: Gavin Sherry [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 26, 2005 7:12 AM
To: Chris Isaacson
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] COPY insert performance


Hi Chris,

Have you considered breaking the data into multiple chunks and COPYing
each concurrently?

Also, have you ensured that your table isn't storing OIDs?

On Mon, 25 Jul 2005, Chris Isaacson wrote:

> #-
> --
> 
> # RESOURCE USAGE (except WAL)
>
#---
> 
> shared_buffers = 65536  # min 16, at least max_connections*2, 8KB each

shared_buffers that high has been shown to affect performance. Try
12000.

> wal_buffers = 64  # min 4, 8KB each

Increasing wal_buffers can also have an effect on performance.

Thanks,

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] COPY insert performance

2005-07-26 Thread Jeff Trout


On Jul 26, 2005, at 8:15 AM, Chris Isaacson wrote:


I am using InnoDB with MySQL which appears to enforce true transaction
support.  (http://dev.mysql.com/doc/mysql/en/innodb-overview.html)  If
not, how is InnoDB "cheating"?



are you sure your tables are innodb?
chances are high unless you explcitly stated "type = innodb" when  
creating that they are myisam.


look at "show table status" output to verify.



I've played with the checkpoint_segments.  I noticed an enormous
improvement increasing from the default to 40, but neglible  
improvement

thereafter.  Do you have a recommendation for a value?


there's been a thread on -hackers recently about checkpoint issues..  
in a nut shell there isn't much to do.  But I'd say give bizgres a  
try if you're going to be continually loading huge amounts of data.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Looking for tips

2005-07-26 Thread Vivek Khera


On Jul 19, 2005, at 3:01 PM, Tom Lane wrote:

You could possibly get some improvement if you can re-use prepared  
plans
for the queries; but this will require some fooling with the client  
code

(I'm not sure if DBD::Pg even has support for it at all).



DBD::Pg 1.40+ by default uses server-side prepared statements when  
you do $dbh->prepare() against an 8.x database server.


Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature


[PERFORM] Cheap RAM disk?

2005-07-26 Thread John A Meinel
I saw a review of a relatively inexpensive RAM disk over at 
anandtech.com, the Gigabyte i-RAM

http://www.anandtech.com/storage/showdoc.aspx?i=2480

Basically, it is a PCI card, which takes standard DDR RAM, and has a 
SATA port on it, so that to the system, it looks like a normal SATA drive.


The card costs about $100-150, and you fill it with your own ram, so for 
a 4GB (max size) disk, it costs around $500. Looking for solid state 
storage devices, the cheapest I found was around $5k for 2GB.


Gigabyte claims that the battery backup can last up to 16h, which seems 
decent, if not really long (the $5k solution has a built-in harddrive so 
that if the power goes out, it uses the battery power to copy the 
ramdisk onto the harddrive for more permanent storage).


Anyway, would something like this be reasonable as a drive for storing 
pg_xlog? With 4GB you could have as many as 256 checkpoint segments.


I'm a little leary as it is definitely a version 1.0 product (it is 
still using an FPGA as the controller, so they were obviously pushing to 
get the card into production).


But it seems like this might be a decent way to improve insert 
performance, without setting fsync=false.


Probably it should see some serious testing (as in power spikes/pulled 
plugs, etc). I know the article made some claim that if you actually 
pull out the card it goes into "high consumption mode" which is somehow 
greater than if you leave it in the slot with the power off. Which to me 
seems like a lot of bull, and really means the 16h is only under 
best-case circumstances. But even 1-2h is sufficient to handle a simple 
power outage.


And if you had a UPS with detection of power failure, you could always 
sync the ramdisk to a local partition before the power goes out. Though 
you could do that with a normal in-memory ramdisk (tmpfs) without having 
to buy the card. Though it does give you up-to an extra 4GB of ram, for 
machines which have already maxed out their slots.


Anyway, I thought I would mention it to the list, to see if anyone else 
has heard of it, or has any thoughts on the matter. I'm sure there are 
some people who are using more expensive ram disks, maybe they have some 
ideas about what this device is missing. (other than costing about 
1/10th the price)


John
=:->



signature.asc
Description: OpenPGP digital signature


[PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Dan Harris
I am working on a process that will be inserting tens of million rows  
and need this to be as quick as possible.


The catch is that for each row I could potentially insert, I need to  
look and see if the relationship is already there  to prevent  
multiple entries.  Currently I am doing a SELECT before doing the  
INSERT, but I recognize the speed penalty in doing to operations.  I  
wonder if there is some way I can say "insert this record, only if it  
doesn't exist already".  To see if it exists, I would need to compare  
3 fields instead of just enforcing a primary key.


Even if this could be a small increase per record, even a few percent  
faster compounded over the whole load could be a significant reduction.


Thanks for any ideas you might have.

-Dan

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Chris Browne
[EMAIL PROTECTED] (John A Meinel) writes:
> I saw a review of a relatively inexpensive RAM disk over at
> anandtech.com, the Gigabyte i-RAM
> http://www.anandtech.com/storage/showdoc.aspx?i=2480

And the review shows that it's not *all* that valuable for many of the
cases they looked at.

> Basically, it is a PCI card, which takes standard DDR RAM, and has a
> SATA port on it, so that to the system, it looks like a normal SATA
> drive.
>
> The card costs about $100-150, and you fill it with your own ram, so
> for a 4GB (max size) disk, it costs around $500. Looking for solid
> state storage devices, the cheapest I found was around $5k for 2GB.
>
> Gigabyte claims that the battery backup can last up to 16h, which
> seems decent, if not really long (the $5k solution has a built-in
> harddrive so that if the power goes out, it uses the battery power to
> copy the ramdisk onto the harddrive for more permanent storage).
>
> Anyway, would something like this be reasonable as a drive for storing
> pg_xlog? With 4GB you could have as many as 256 checkpoint segments.
>
> I'm a little leary as it is definitely a version 1.0 product (it is
> still using an FPGA as the controller, so they were obviously pushing
> to get the card into production).

What disappoints me is that nobody has tried the CF/RAM answer; rather
than putting a hard drive on the board, you put on some form of flash
device (CompactFlash or such), where if power fails, it pushes data
onto the CF.  That ought to be cheaper (both in terms of hardware cost
and power consumption) than using a hard disk.

> But it seems like this might be a decent way to improve insert
> performance, without setting fsync=false.

That's the case which might prove Ludicrously Quicker than any of the
sample cases in the review.

> Probably it should see some serious testing (as in power spikes/pulled
> plugs, etc). I know the article made some claim that if you actually
> pull out the card it goes into "high consumption mode" which is
> somehow greater than if you leave it in the slot with the power
> off. Which to me seems like a lot of bull, and really means the 16h is
> only under best-case circumstances. But even 1-2h is sufficient to
> handle a simple power outage.

Certainly.

> Anyway, I thought I would mention it to the list, to see if anyone
> else has heard of it, or has any thoughts on the matter. I'm sure
> there are some people who are using more expensive ram disks, maybe
> they have some ideas about what this device is missing. (other than
> costing about 1/10th the price)

Well, if it hits a "2.0" version, it may get interesting...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread John A Meinel

Dan Harris wrote:
I am working on a process that will be inserting tens of million rows  
and need this to be as quick as possible.


The catch is that for each row I could potentially insert, I need to  
look and see if the relationship is already there  to prevent  multiple 
entries.  Currently I am doing a SELECT before doing the  INSERT, but I 
recognize the speed penalty in doing to operations.  I  wonder if there 
is some way I can say "insert this record, only if it  doesn't exist 
already".  To see if it exists, I would need to compare  3 fields 
instead of just enforcing a primary key.


Even if this could be a small increase per record, even a few percent  
faster compounded over the whole load could be a significant reduction.


Thanks for any ideas you might have.

-Dan



You could insert all of your data into a temporary table, and then do:

INSERT INTO final_table SELECT * FROM temp_table WHERE NOT EXISTS 
(SELECT info FROM final_table WHERE id=id, path=path, y=y);


Or you could load it into the temporary table, and then:
DELETE FROM temp_table WHERE EXISTS (SELECT FROM final_table WHERE id...);

And then do a plain INSERT INTO.

I can't say what the specific performance increases would be, but 
temp_table could certainly be an actual TEMP table (meaning it only 
exists during the connection), and you could easily do a COPY into that 
table to load it up quickly, without having to check any constraints.


Just a thought,
John
=:->



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Jeffrey W. Baker
On Tue, 2005-07-26 at 11:34 -0500, John A Meinel wrote:
> I saw a review of a relatively inexpensive RAM disk over at 
> anandtech.com, the Gigabyte i-RAM
> http://www.anandtech.com/storage/showdoc.aspx?i=2480
> 
> Basically, it is a PCI card, which takes standard DDR RAM, and has a 
> SATA port on it, so that to the system, it looks like a normal SATA drive.
> 
> The card costs about $100-150, and you fill it with your own ram, so for 
> a 4GB (max size) disk, it costs around $500. Looking for solid state 
> storage devices, the cheapest I found was around $5k for 2GB.
> 
> Gigabyte claims that the battery backup can last up to 16h, which seems 
> decent, if not really long (the $5k solution has a built-in harddrive so 
> that if the power goes out, it uses the battery power to copy the 
> ramdisk onto the harddrive for more permanent storage).
> 
> Anyway, would something like this be reasonable as a drive for storing 
> pg_xlog? With 4GB you could have as many as 256 checkpoint segments.

I haven't tried this product, but the microbenchmarks seem truly slow.
I think you would get a similar benefit by simply sticking a 1GB or 2GB
DIMM -- battery-backed, of course -- in your RAID controller.

-jwb

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Vivek Khera


On Jul 26, 2005, at 12:34 PM, John A Meinel wrote:

Basically, it is a PCI card, which takes standard DDR RAM, and has  
a SATA port on it, so that to the system, it looks like a normal  
SATA drive.


The card costs about $100-150, and you fill it with your own ram,  
so for a 4GB (max size) disk, it costs around $500. Looking for  
solid state storage devices, the cheapest I found was around $5k  
for 2GB.




gotta love /. don't ya?

This card doesn't accept ECC RAM therefore it is nothing more than a  
toy.  I wouldn't trust it as far as I could throw it.


There are other vendors of SSD's out there.  Some even have *real*  
power fail strategies such as dumping to a physical disk.  These are  
not cheap, but you gets what ya pays for...


Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread PFC



I'm a little leary as it is definitely a version 1.0 product (it is
still using an FPGA as the controller, so they were obviously pushing to
get the card into production).


	Not necessarily. FPGA's have become a sensible choice now. My RME studio  
soundcard uses a big FPGA.


	The performance in the test doesn't look that good, though, but don't  
forget it was run under windows. For instance they get 77s to copy the  
Firefox source tree on their Athlon 64/raptor ; my Duron / 7200rpm ide  
drive does it in 30 seconds, but not with windows of course.


	However it doesnt' use ECC so... That's a pity, because they could have  
implemented ECC in "software" inside the chip, and have the benefits of  
error correction with normal, cheap RAM.


Well; wait and see...

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Luke Lonergan
Yup - interesting and very niche product - it seems like it's only obvious
application is for the Postgresql WAL problem :-)

The real differentiator is the battery backup part.  Otherwise, the
filesystem caching is more effective, so put the RAM on the motherboard.

- Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread John A Meinel

Luke Lonergan wrote:

Yup - interesting and very niche product - it seems like it's only obvious
application is for the Postgresql WAL problem :-)


Well, you could do it for any journaled system (XFS, JFS, ext3, reiserfs).

But yes, it seems specifically designed for a battery backed journal. 
Though the article reviews it for very different purposes.


Though it was a Windows review, and I don't know of any way to make NTFS 
use a separate device for a journal. (Though I expect it is possible 
somehow).


John
=:->




The real differentiator is the battery backup part.  Otherwise, the
filesystem caching is more effective, so put the RAM on the motherboard.

- Luke






signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Luke Lonergan
John,

On 7/26/05 9:56 AM, "John A Meinel" <[EMAIL PROTECTED]> wrote:

> You could insert all of your data into a temporary table, and then do:
> 
> INSERT INTO final_table SELECT * FROM temp_table WHERE NOT EXISTS
> (SELECT info FROM final_table WHERE id=id, path=path, y=y);
> 
> Or you could load it into the temporary table, and then:
> DELETE FROM temp_table WHERE EXISTS (SELECT FROM final_table WHERE id...);
> 
> And then do a plain INSERT INTO.
> 
> I can't say what the specific performance increases would be, but
> temp_table could certainly be an actual TEMP table (meaning it only
> exists during the connection), and you could easily do a COPY into that
> table to load it up quickly, without having to check any constraints.

Yah - that's a typical approach, and it would be excellent if the COPY
bypassed WAL for the temp table load.  This is something we discussed in
bizgres development a while back.  I think we should do this for sure -
would nearly double the temp table load rate, and the subsequent temp table
delete *should* be fast enough (?)  Any performance tests you've done on
that delete/subselect operation?

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Sven Willenberger
On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote:
> I am working on a process that will be inserting tens of million rows  
> and need this to be as quick as possible.
> 
> The catch is that for each row I could potentially insert, I need to  
> look and see if the relationship is already there  to prevent  
> multiple entries.  Currently I am doing a SELECT before doing the  
> INSERT, but I recognize the speed penalty in doing to operations.  I  
> wonder if there is some way I can say "insert this record, only if it  
> doesn't exist already".  To see if it exists, I would need to compare  
> 3 fields instead of just enforcing a primary key.
> 
> Even if this could be a small increase per record, even a few percent  
> faster compounded over the whole load could be a significant reduction.
> 
> Thanks for any ideas you might have.
> 

Perhaps a trigger:

CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$
BEGIN
PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c;
IF FOUND THEN 
RETURN NULL;
END IF;
RETURN NEW;
END;
$func$ LANGUAGE plpgsql STABLE;

CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW
EXECUTE PROCEDURE verify_unique();

Triggers are fired on COPY commands and if table1 is able to be cached
and you have an index on table1(a,b,c) the results should be fairly
decent. I would be interested in seeing the difference in timing between
this approach and the temp table approach.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner
Also seems pretty silly to put it on a regular SATA connection, when
all that can manage is 150MB/sec.  If you made it connection directly
to 66/64-bit PCI then it could actualy _use_ the speed of the RAM, not
to mention PCI-X.

Alex Turner
NetEconomist

On 7/26/05, John A Meinel <[EMAIL PROTECTED]> wrote:
> I saw a review of a relatively inexpensive RAM disk over at
> anandtech.com, the Gigabyte i-RAM
> http://www.anandtech.com/storage/showdoc.aspx?i=2480
> 
> Basically, it is a PCI card, which takes standard DDR RAM, and has a
> SATA port on it, so that to the system, it looks like a normal SATA drive.
> 
> The card costs about $100-150, and you fill it with your own ram, so for
> a 4GB (max size) disk, it costs around $500. Looking for solid state
> storage devices, the cheapest I found was around $5k for 2GB.
> 
> Gigabyte claims that the battery backup can last up to 16h, which seems
> decent, if not really long (the $5k solution has a built-in harddrive so
> that if the power goes out, it uses the battery power to copy the
> ramdisk onto the harddrive for more permanent storage).
> 
> Anyway, would something like this be reasonable as a drive for storing
> pg_xlog? With 4GB you could have as many as 256 checkpoint segments.
> 
> I'm a little leary as it is definitely a version 1.0 product (it is
> still using an FPGA as the controller, so they were obviously pushing to
> get the card into production).
> 
> But it seems like this might be a decent way to improve insert
> performance, without setting fsync=false.
> 
> Probably it should see some serious testing (as in power spikes/pulled
> plugs, etc). I know the article made some claim that if you actually
> pull out the card it goes into "high consumption mode" which is somehow
> greater than if you leave it in the slot with the power off. Which to me
> seems like a lot of bull, and really means the 16h is only under
> best-case circumstances. But even 1-2h is sufficient to handle a simple
> power outage.
> 
> And if you had a UPS with detection of power failure, you could always
> sync the ramdisk to a local partition before the power goes out. Though
> you could do that with a normal in-memory ramdisk (tmpfs) without having
> to buy the card. Though it does give you up-to an extra 4GB of ram, for
> machines which have already maxed out their slots.
> 
> Anyway, I thought I would mention it to the list, to see if anyone else
> has heard of it, or has any thoughts on the matter. I'm sure there are
> some people who are using more expensive ram disks, maybe they have some
> ideas about what this device is missing. (other than costing about
> 1/10th the price)
> 
> John
> =:->
> 
> 
> 
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner
Please see:

http://www.newegg.com/Product/Product.asp?Item=N82E16820145309
and
http://www.newegg.com/Product/Product.asp?Item=N82E16820145416

The price of Reg ECC is not significantly higher than regular ram at
this point.  Plus if you go with super fast 2-2-2-6 then it's actualy
more than good ol 2.5 Reg ECC.

Alex Turner
NetEconomist

On 7/26/05, PFC <[EMAIL PROTECTED]> wrote:
> 
> > I'm a little leary as it is definitely a version 1.0 product (it is
> > still using an FPGA as the controller, so they were obviously pushing to
> > get the card into production).
> 
> Not necessarily. FPGA's have become a sensible choice now. My RME 
> studio
> soundcard uses a big FPGA.
> 
> The performance in the test doesn't look that good, though, but don't
> forget it was run under windows. For instance they get 77s to copy the
> Firefox source tree on their Athlon 64/raptor ; my Duron / 7200rpm ide
> drive does it in 30 seconds, but not with windows of course.
> 
> However it doesnt' use ECC so... That's a pity, because they could 
> have
> implemented ECC in "software" inside the chip, and have the benefits of
> error correction with normal, cheap RAM.
> 
> Well; wait and see...
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread John A Meinel

Alex Turner wrote:

Also seems pretty silly to put it on a regular SATA connection, when
all that can manage is 150MB/sec.  If you made it connection directly
to 66/64-bit PCI then it could actualy _use_ the speed of the RAM, not
to mention PCI-X.

Alex Turner
NetEconomist



Well, the whole point is to have it look like a normal SATA drive, even 
to the point that you can boot off of it, without having to load a 
single driver.


Now, you could offer that you could recreate a SATA controller on the 
card, with a SATA bios, etc. And then you could get the increased speed, 
and still have bootable functionality.


But it is a version 1.0 of a product, and I'm sure they tried to make it 
as cheap as possible (and within their own capabilities.)


John
=:->



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Michael Stone

On Tue, Jul 26, 2005 at 11:23:23AM -0700, Luke Lonergan wrote:

Yup - interesting and very niche product - it seems like it's only obvious
application is for the Postgresql WAL problem :-)


On the contrary--it's not obvious that it is an ideal fit for a WAL. A
ram disk like this is optimized for highly random access applications.
The WAL is a single sequential writer. If you're in the kind of market
that needs a really high performance WAL you'd be much better served by
putting a big NVRAM cache in front of a fast disk array than by buying a
toy like this. 


Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [Bizgres-general] Re: [PERFORM] faster INSERT with possible

2005-07-26 Thread Luke Lonergan
Hannu,

On 7/26/05 11:56 AM, "Hannu Krosing" <[EMAIL PROTECTED]> wrote:

> On T, 2005-07-26 at 11:46 -0700, Luke Lonergan wrote:
> 
>> Yah - that's a typical approach, and it would be excellent if the COPY
>> bypassed WAL for the temp table load.
> 
> Don't *all* operations on TEMP tables bypass WAL ?

Good question - do they?  We had discussed the bypass as an elective option,
or an automated one for special conditions (no index on table, empty table)
or both.  I thought that temp tables was one of those special conditions.

Well - now that I test it, it appears you are correct, temp table COPY
bypasses WAL - thanks for pointing it out!

The following test is on a load of 200MB of table data from an ASCII file
with 1 text column of size 145MB.

- Luke

= TEST ===
dgtestdb=# create temporary table temp1 (a text);
CREATE TABLE
dgtestdb=# \timing
Timing is on.
dgtestdb=# \i copy.ctl
COPY
Time: 4549.212 ms
dgtestdb=# \i copy.ctl
COPY
Time: 3897.395 ms

-- that's two tests, two loads of 200MB each, averaging 4.2 secs

dgtestdb=# create table temp2 as select * from temp1;
SELECT
Time: 5914.803 ms

-- a quick comparison to "CREATE TABLE AS SELECT", which bypasses WAL
-- on bizgres

dgtestdb=# drop table temp1;
DROP TABLE
Time: 135.782 ms
dgtestdb=# drop table temp2;
DROP TABLE
Time: 3.707 ms
dgtestdb=# create table temp1 (a text);
CREATE TABLE
Time: 1.667 ms
dgtestdb=# \i copy.ctl
COPY
Time: 6034.274 ms
dgtestdb=# 

-- This was a non-temporary table COPY, showing the slower performance of 6
secs.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] [IMPORTANT] - My application performance

2005-07-26 Thread Roberto Germano Vieweg Neto
My application is using Firebird 1.5.2

I have at my database:
- 150 Doamins
- 318 tables
- 141 Views
- 365 Procedures
- 407 Triggers
- 75 generators
- 161 Exceptions
- 183 UDFs
- 1077 Indexes

My question is:

Postgre SQL will be more faster than Firebird? How much (in percent)?

I need about 20% to 50% more performance at my application.
Can I get this migratin to postgresql ?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Matthew Nuzum
On 7/26/05, Dan Harris <[EMAIL PROTECTED]> wrote:
> I am working on a process that will be inserting tens of million rows
> and need this to be as quick as possible.
> 
> The catch is that for each row I could potentially insert, I need to
> look and see if the relationship is already there  to prevent
> multiple entries.  Currently I am doing a SELECT before doing the
> INSERT, but I recognize the speed penalty in doing to operations.  I
> wonder if there is some way I can say "insert this record, only if it
> doesn't exist already".  To see if it exists, I would need to compare
> 3 fields instead of just enforcing a primary key.

I struggled with this for a while. At first I tried stored procedures
and triggers, but it took very long (over 24 hours for my dataset).
After several iterations of rewritting it, first into C# then into
Python I got the whole process down to under 30 min.

My scenario is this:
I want to normalize log data. For example, for the IP address in a log
entry, I need to look up the unique id of the IP address, or if the IP
address is new, insert it and then return the newly created entry.
Multiple processes use the data, but only one process, run daily,
actually changes it. Because this one process knows that the data is
static, it selects the tables into in-memory hash tables (C#) or
Dictionaries (Python) and then does the lookups there. It is *super*
fast, but it uses a *lot* of ram. ;-)

To limit the ram, I wrote a version of the python code that uses gdbm
files instead of Dictionaries. This requires a newer version of Python
(to allow a gdbm db to work just like a dictionary) but makes life
easier in case someone is using my software on a lower end machine.
This doubled the time of the lookups from about 15 minutes to 30,
bringing the whole process to about 45 minutes.

-- 
Matthew Nuzum
www.bearfruit.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread John A Meinel

Matthew Nuzum wrote:

On 7/26/05, Dan Harris <[EMAIL PROTECTED]> wrote:


I am working on a process that will be inserting tens of million rows
and need this to be as quick as possible.

The catch is that for each row I could potentially insert, I need to
look and see if the relationship is already there  to prevent
multiple entries.  Currently I am doing a SELECT before doing the
INSERT, but I recognize the speed penalty in doing to operations.  I
wonder if there is some way I can say "insert this record, only if it
doesn't exist already".  To see if it exists, I would need to compare
3 fields instead of just enforcing a primary key.



I struggled with this for a while. At first I tried stored procedures
and triggers, but it took very long (over 24 hours for my dataset).
After several iterations of rewritting it, first into C# then into
Python I got the whole process down to under 30 min.

My scenario is this:
I want to normalize log data. For example, for the IP address in a log
entry, I need to look up the unique id of the IP address, or if the IP
address is new, insert it and then return the newly created entry.
Multiple processes use the data, but only one process, run daily,
actually changes it. Because this one process knows that the data is
static, it selects the tables into in-memory hash tables (C#) or
Dictionaries (Python) and then does the lookups there. It is *super*
fast, but it uses a *lot* of ram. ;-)

To limit the ram, I wrote a version of the python code that uses gdbm
files instead of Dictionaries. This requires a newer version of Python
(to allow a gdbm db to work just like a dictionary) but makes life
easier in case someone is using my software on a lower end machine.
This doubled the time of the lookups from about 15 minutes to 30,
bringing the whole process to about 45 minutes.



Did you ever try the temp table approach? You could:

COPY all records into temp_table, with an empty row for ip_id
-- Get any entries which already exist
UPDATE temp_table SET ip_id =
(SELECT ip_id from ipaddress WHERE add=add)
 WHERE EXISTS (SELECT ip_id FROM ipaddress WHERE add=add);
-- Create new entries
INSERT INTO ipaddress(add) SELECT add FROM temp_table
  WHERE ip_id IS NULL;
-- Update the rest
UPDATE temp_table SET ip_id =
(SELECT ip_id from ipaddress WHERE add=add)
 WHERE ip_id IS NULL AND
EXISTS (SELECT ip_id FROM ipaddress WHERE add=add);

This would let the database do all of the updating work in bulk on it's 
side, rather than you pulling all the data out and doing it locally.


An alternative would be something like:

CREATE TEMP TABLE new_ids (address text, ip_id int);
COPY all potentially new addresses into that table.
-- Delete all entries which already exist
DELETE FROM new_ids WHERE EXISTS
(SELECT ip_id FROM ipaddresses
  WHERE add=new_ids.address);
-- Now create the new entries
INSERT INTO ipaddresses(add) SELECT address FROM new_ids;

-- At this point you are guaranteed to have all addresses existing in
-- the database

If you then insert your full data into the final table, only leave the 
ip_id column as null. Then if you have a partial index where ip_id is 
NULL, you could use the command:


UPDATE final_table SET ip_id =
(SELECT ip_id FROM ipaddresses WHERE add=final_table.add)
WHERE ip_id IS NULL;

You could also do this in a temporary table, before bulk inserting into 
the final table.


I don't know what you have tried, but I know that for Dan, he easily has 
> 36M rows. So I don't think he wants to pull that locally and create a 
in-memory hash just to insert 100 rows or so.


Also, for your situation, if you do keep a local cache, you could 
certainly save the cache between runs, and use a temp table to determine 
what new ids you need to add to it. Then you wouldn't have to pull the 
complete set each time. You just pull new values for entries you haven't 
added yet.


John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Richard_D_Levine
> you'd be much better served by
> putting a big NVRAM cache in front of a fast disk array

I agree with the point below, but I think price was the issue of the
original discussion.  That said, it seems that a single high speed spindle
would give this a run for its money in both price and performance, and for
the same reasons Mike points out.  Maybe a SCSI 160 or 320 at 15k, or maybe
even something slower.

Rick

[EMAIL PROTECTED] wrote on 07/26/2005 01:33:43 PM:

> On Tue, Jul 26, 2005 at 11:23:23AM -0700, Luke Lonergan wrote:
> >Yup - interesting and very niche product - it seems like it's only
obvious
> >application is for the Postgresql WAL problem :-)
>
> On the contrary--it's not obvious that it is an ideal fit for a WAL. A
> ram disk like this is optimized for highly random access applications.
> The WAL is a single sequential writer. If you're in the kind of market
> that needs a really high performance WAL you'd be much better served by
> putting a big NVRAM cache in front of a fast disk array than by buying a
> toy like this.
>
> Mike Stone
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [IMPORTANT] - My application performance

2005-07-26 Thread Jim C. Nasby
The number of objects in your system has virtually nothing to do with
performance (at least on any decent database...)

What is your application doing? What's the bottleneck right now?

On Tue, Jul 26, 2005 at 04:35:19PM -0300, Roberto Germano Vieweg Neto wrote:
> My application is using Firebird 1.5.2
> 
> I have at my database:
> - 150 Doamins
> - 318 tables
> - 141 Views
> - 365 Procedures
> - 407 Triggers
> - 75 generators
> - 161 Exceptions
> - 183 UDFs
> - 1077 Indexes
> 
> My question is:
> 
> Postgre SQL will be more faster than Firebird? How much (in percent)?
> 
> I need about 20% to 50% more performance at my application.
> Can I get this migratin to postgresql ?
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 6: explain analyze is your friend


RES: [PERFORM] [IMPORTANT] - My application performance

2005-07-26 Thread Alvaro Neto
What's the your platform?  Windows or Linux?
What's the data volume (up million records)?

-Mensagem original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Em nome de Roberto Germano
Vieweg Neto
Enviada em: terça-feira, 26 de julho de 2005 16:35
Para: pgsql-performance@postgresql.org
Assunto: [PERFORM] [IMPORTANT] - My application performance

My application is using Firebird 1.5.2

I have at my database:
- 150 Doamins
- 318 tables
- 141 Views
- 365 Procedures
- 407 Triggers
- 75 generators
- 161 Exceptions
- 183 UDFs
- 1077 Indexes

My question is:

Postgre SQL will be more faster than Firebird? How much (in percent)?

I need about 20% to 50% more performance at my application.
Can I get this migratin to postgresql ?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Mark Lewis
Easier and faster than doing the custom trigger is to simply define a
unique index and let the DB enforce the constraint with an index lookup,
something like:

create unique index happy_index ON happy_table(col1, col2, col3);

That should run faster than the custom trigger, but not as fast as the
temp table solution suggested elsewhere because it will need to do an
index lookup for each row.  With this solution, it is important that
your shared_buffers are set high enough that the happy_index can be kept
in memory, otherwise performance will drop precipitously.  Also, if you
are increasing the size of the table by a large percentage, you will
want to ANALYZE periodically, as an optimal plan for a small table may
be a disaster for a large table, and PostgreSQL won't switch plans
unless you run ANALYZE.

-- Mark

On Tue, 2005-07-26 at 14:51 -0500, John A Meinel wrote:
> Matthew Nuzum wrote:
> > On 7/26/05, Dan Harris <[EMAIL PROTECTED]> wrote:
> > 
> >>I am working on a process that will be inserting tens of million rows
> >>and need this to be as quick as possible.
> >>
> >>The catch is that for each row I could potentially insert, I need to
> >>look and see if the relationship is already there  to prevent
> >>multiple entries.  Currently I am doing a SELECT before doing the
> >>INSERT, but I recognize the speed penalty in doing to operations.  I
> >>wonder if there is some way I can say "insert this record, only if it
> >>doesn't exist already".  To see if it exists, I would need to compare
> >>3 fields instead of just enforcing a primary key.
> > 
> > 
> > I struggled with this for a while. At first I tried stored procedures
> > and triggers, but it took very long (over 24 hours for my dataset).
> > After several iterations of rewritting it, first into C# then into
> > Python I got the whole process down to under 30 min.
> > 
> > My scenario is this:
> > I want to normalize log data. For example, for the IP address in a log
> > entry, I need to look up the unique id of the IP address, or if the IP
> > address is new, insert it and then return the newly created entry.
> > Multiple processes use the data, but only one process, run daily,
> > actually changes it. Because this one process knows that the data is
> > static, it selects the tables into in-memory hash tables (C#) or
> > Dictionaries (Python) and then does the lookups there. It is *super*
> > fast, but it uses a *lot* of ram. ;-)
> > 
> > To limit the ram, I wrote a version of the python code that uses gdbm
> > files instead of Dictionaries. This requires a newer version of Python
> > (to allow a gdbm db to work just like a dictionary) but makes life
> > easier in case someone is using my software on a lower end machine.
> > This doubled the time of the lookups from about 15 minutes to 30,
> > bringing the whole process to about 45 minutes.
> > 
> 
> Did you ever try the temp table approach? You could:
> 
> COPY all records into temp_table, with an empty row for ip_id
> -- Get any entries which already exist
> UPDATE temp_table SET ip_id =
>   (SELECT ip_id from ipaddress WHERE add=add)
>   WHERE EXISTS (SELECT ip_id FROM ipaddress WHERE add=add);
> -- Create new entries
> INSERT INTO ipaddress(add) SELECT add FROM temp_table
>WHERE ip_id IS NULL;
> -- Update the rest
> UPDATE temp_table SET ip_id =
>   (SELECT ip_id from ipaddress WHERE add=add)
>   WHERE ip_id IS NULL AND
>   EXISTS (SELECT ip_id FROM ipaddress WHERE add=add);
> 
> This would let the database do all of the updating work in bulk on it's 
> side, rather than you pulling all the data out and doing it locally.
> 
> An alternative would be something like:
> 
> CREATE TEMP TABLE new_ids (address text, ip_id int);
> COPY all potentially new addresses into that table.
> -- Delete all entries which already exist
> DELETE FROM new_ids WHERE EXISTS
>   (SELECT ip_id FROM ipaddresses
> WHERE add=new_ids.address);
> -- Now create the new entries
> INSERT INTO ipaddresses(add) SELECT address FROM new_ids;
> 
> -- At this point you are guaranteed to have all addresses existing in
> -- the database
> 
> If you then insert your full data into the final table, only leave the 
> ip_id column as null. Then if you have a partial index where ip_id is 
> NULL, you could use the command:
> 
> UPDATE final_table SET ip_id =
>   (SELECT ip_id FROM ipaddresses WHERE add=final_table.add)
> WHERE ip_id IS NULL;
> 
> You could also do this in a temporary table, before bulk inserting into 
> the final table.
> 
> I don't know what you have tried, but I know that for Dan, he easily has 
>  > 36M rows. So I don't think he wants to pull that locally and create a 
> in-memory hash just to insert 100 rows or so.
> 
> Also, for your situation, if you do keep a local cache, you could 
> certainly save the cache between runs, and use a temp table to determine 
> what new ids you need to add to it. Then you wouldn't have to pull the 
> complete set each time. You just pull new value

Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Chris Browne
[EMAIL PROTECTED] ("Jeffrey W. Baker") writes:
> I haven't tried this product, but the microbenchmarks seem truly
> slow.  I think you would get a similar benefit by simply sticking a
> 1GB or 2GB DIMM -- battery-backed, of course -- in your RAID
> controller.

Well, the microbenchmarks were pretty pre-sophomoric, essentially
trying to express how the device would be useful to a Windows user
that *might* play games...

I'm sure it's hurt by the fact that it's using a SATA ("version 1")
interface rather than something faster.

Mind you, I'd like to see the product succeed, because they might come
up with a "version 2" of it that is what I'd really like...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78.  "I will not tell my Legions of Terror
"And he must  be taken alive!" The command will be:  ``And try to take
him alive if it is reasonably practical.''"


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] faster INSERT with possible pre-existing row?

2005-07-26 Thread Christopher Kings-Lynne
Insert into a temp table then use INSERT INTO...SELECT FROM to insert 
all rows into the proper table that don't have a relationship.


Chris

Dan Harris wrote:
I am working on a process that will be inserting tens of million rows  
and need this to be as quick as possible.


The catch is that for each row I could potentially insert, I need to  
look and see if the relationship is already there  to prevent  multiple 
entries.  Currently I am doing a SELECT before doing the  INSERT, but I 
recognize the speed penalty in doing to operations.  I  wonder if there 
is some way I can say "insert this record, only if it  doesn't exist 
already".  To see if it exists, I would need to compare  3 fields 
instead of just enforcing a primary key.


Even if this could be a small increase per record, even a few percent  
faster compounded over the whole load could be a significant reduction.


Thanks for any ideas you might have.

-Dan

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] [IMPORTANT] - My application performance

2005-07-26 Thread Christopher Kings-Lynne



Roberto Germano Vieweg Neto wrote:

My application is using Firebird 1.5.2

I have at my database:
- 150 Doamins
- 318 tables
- 141 Views
- 365 Procedures
- 407 Triggers
- 75 generators
- 161 Exceptions
- 183 UDFs
- 1077 Indexes

My question is:

Postgre SQL will be more faster than Firebird? How much (in percent)?


I think you can probably expect around 10341.426% improvement.







ps. Yes, I am joking just in case...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq