Re: [PERFORM] Running 9 in production? Sticking with 8.4.4 for a while?

2010-09-28 Thread Guy Rouillier

On 9/28/2010 4:45 PM, Greg Smith wrote:

Tory M Blue wrote:

I'm doing an OS upgrade and have been sitting on 8.4.3 for sometime. I
was wondering if it's better for the short term just to bring things
to 8.4.4 and let 9.0 bake a bit longer, or are people with large data
sets running 9.0 in production already?


I'm aware of two people with large data sets who have been running 9.0
in production since it was in beta. Like most code, what you have to
consider is how much the code path you expect to use each day has been
modified during the previous release. If you're using 9.0 as a better
8.4, the odds of your running into a problem are on the low side of the
risk curve. But those using the features that are both new and were
worked on until the very end of the development cycle, like the new
replication features, they are much more likely to run into a bug.


A conservative approach is never to use version x.0 of *anything*.  The 
PG developers are very talented (and also very helpful on these mailing 
lists - thanks for that), but they are human.  For work I'm paid to do 
(as opposed to my own or charity work), I like to stay at least one 
point release behind the bleeding edge.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Guy Rouillier

Kevin Grittner wrote:

Grzegorz JaƓkiewiczgryz...@gmail.com wrote:



A failing of the SQL standard is that it uses the same mark (NULL) to
show the absence of a value because it is unknown as for the case
where it is known that no value exists (not applicable).  Codd argued
for a distinction there, but it hasn't come to pass, at least in the
standard.  If anyone could suggest a way to support standard syntax
and semantics and add extensions to support this distinction, it might
be another advance that would distinguish PostgreSQL from less
evolved products.   :-)


Theoretically, the distinction already exists.  If you don't know a 
person's middle initial, then set it to null; if you know the person 
doesn't have one, set it to the empty string.


But from a practical point of view, that wouldn't go very far.  Most 
*people* equate an empty string to mean the same as null.  When I wrote 
my own data access layer years ago, I expressly checked for empty 
strings on input and changed them to null.  I did this because empty 
strings had a nasty way of creeping into our databases; writing queries 
to produce predictable results got to be very messy.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Guy Rouillier

Craig James wrote:

Kevin Grittner wrote:

Which leaves the issue open -- a flexible way to flag the *reason* (or
*reasons*) for the absence of a value could be a nice enhancement, if
someone could invent a good implementation.  Of course, one could
always add a column to indicate the reason for a NULL; and perhaps
that would be as good as any scheme to attach reason flags to NULL. 
You'd just have to make sure the reason column was null capable for

those rows where there *was* a value, which would make the reason not
applicable


I'd argue that this is just a special case of a broader problem of 
metadata: Data about the data.  For example, I could have a temperature, 
40 degrees, and an error bounds, +/- 0.25 degrees.  Nobody would think 
twice about making these separate columns.  I don't see how this is any 
different from a person's middle initial of NULL, plus a separate column 
indicating not known versus doesn't have one if that distinction is 
important.  There are many examples like this, where a simple value in 
one column isn't sufficient, so another column contains metadata that 
qualifies or clarifies the information.  NULL is just one such case.


But, this should probably be on an SQL discussion board, not PG 
performance...


Most DBMSs I'm aware of use a null *byte* attached to a nullable column 
to indicate whether the column is null or not.  yes/no takes one *bit*. 
 That leaves 255 other possible values to describe the state of the 
column.  That seems preferable to adding an additional column to every 
nullable column.


But as you say, that would have to be taken up with the SQL 
standardization bodies, and not PostgreSQL.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-05 Thread Guy Rouillier

Grzegorz Jaƛkiewicz wrote:


well, as a rule of thumb - unless you can't think of a default value of 
column - don't use nulls. So using nulls as default 'idunno' - is a bad 
practice, but everybody's opinion on that differ.


I don't understand this point of view.  The concept of null was 
introduced into the SQL vernacular by Codd and Date expressly to 
represent unknown values.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL vs Oracle

2008-12-21 Thread Guy Rouillier

Victor Nawothnig wrote:

Hi,

I am looking for some recent and hopefully genuine comparisons between
Oracle and PostgreSQL regarding their performance in large scale
applications. Tests from real world applications would be preferable
but not required. Also differentiations in different areas (i.e.
different data types, query structures, clusters, hardware, etc.)
might be helpful as well.


Victor, Oracle expressly forbids, in their license agreement, anyone 
from publishing performance comparisons between Oracle and any other 
product.  So you will rarely find anyone willing to publicly provide you 
any performance numbers.


Difference in data structures, etc, are fairly easy to determine. 
Anyone can read the Oracle documentation.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning a new server - help needed

2008-03-30 Thread Guy Rouillier

PFC wrote:



Why do you claim that 'More platters also means slower seeks
and generally slower performance.'?


More platters - more heads - heavier head assembly - slower seek 
time

But..
More platters - higher density - less seek distance (in mm of head 
movement) - faster seek time


More platters means more tracks under the read heads at a time, so 
generally *better* performance.  All other things (like rotational 
speed) being equal, of course.


--
Guy Rouillier

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Making the most of memory?

2008-01-23 Thread Guy Rouillier

Scott Marlowe wrote:

I assume you're talking about solid state drives?  They have their
uses, but for most use cases, having plenty of RAM in your server will
be a better way to spend your money.  For certain high throughput,
relatively small databases (i.e. transactional work) the SSD can be
quite useful.


Unless somebody has changes some physics recently, I'm not understanding 
the recent discussions of SSD in the general press.  Flash has a limited 
number of writes before it becomes unreliable.  On good quality consumer 
grade, that's about 300,000 writes, while on industrial grade it's about 
10 times that.  That's fine for mp3 players and cameras; even 
professional photographers probably won't rewrite the same spot on a 
flash card that many times in a lifetime.  But for database 
applications, 300,000 writes is trivial. 3 million will go a lot longer, 
but in non-archival applications, I imagine even that mark won't take 
but a year or two to surpass.


--
Guy Rouillier

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Barcelona vs Tigerton

2007-09-11 Thread Guy Rouillier

Mindaugas wrote:

  Hello,

  Now that both 4x4 out it's time for us to decide which one should be better 
for our PostgreSQL and Oracle. And especially for Oracle we really need such 
server to squeeze everything from Oracle licenses. Both of the databases handle 
OLTP type of the load.
  Since we plan to buy 4U HP DL580 or 585 and only very few of them so power 
ratings are not very critical in this our case.

  First benchmarks (http://www.anandtech.com/IT/showdoc.aspx?i=3091) show that 
Intel still has more raw CPU power but Barcelona scales much better and also 
has better memory bandwidth which I believe is quite critical with 16 cores and 
DB usage pattern.
  On the other hand Intel's X7350 (2.93GHz) has almost 50% advantage in CPU 
frequency against 2GHz Barcelona.


Barcelona was just announced yesterday.  I wouldn't want to be betting 
my business on it just yet. Plus, AMD usually is able to up the clock on 
their chips pretty well after they've got a few production runs under 
their belts.  If you've absolutely got to have something today, I'd say 
Intel would be a safer bet.  If you can afford to wait 3-4 months, then 
you'll benefit from some industry experience as well as production 
maturity with Barcelona, and can judge then which better fits your needs.


--
Guy Rouillier

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-08-17 Thread Guy Rouillier
I originally posted the question below back in Dec 2006, and many 
helpful suggestions resulted.  Unfortunately, since this was a closet 
effort, my official duties pushed further exploration to the back 
burner, then I lost my original test environment.  So while I can no 
longer compare to BigDBMS, I've just made some discoveries that I 
thought others might find helpful.


The app (which I inherited) was implemented making exhaustive use of 
stored procedures.  All inserts and updates are done using procs.  When 
configuration changes produced no noticeable improvements in 
performance, I turned to the application architecture.  In a new 
environment, I updated an insert/update intensive part of the app to use 
embedded insert and update statements instead of invoking stored 
procedures that did the same work.  All the remaining code, database 
implementation, hardware, etc remains the same.


The results were significant.  Running a repeatable test set of data 
produced the following results:


With stored procs: 2595 seconds
With embedded inserts/updates: 991 seconds

So at least in this one scenario, it looks like the extensive use of 
stored procs is contributing significantly to long run times.


Guy Rouillier wrote:
I don't want to violate any license agreement by discussing performance, 
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as 
BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with PostgreSQL 
for at least some of our Java applications.  As a proof of concept, I 
started with a high-volume (but conceptually simple) network data 
collection application.  This application collects files of 5-minute 
usage statistics from our network devices, and stores a raw form of 
these stats into one table and a normalized form into a second table. We 
are currently storing about 12 million rows a day in the normalized 
table, and each month we start new tables.  For the normalized data, the 
app inserts rows initialized to zero for the entire current day first 
thing in the morning, then throughout the day as stats are received, 
executes updates against existing rows.  So the app has very high update 
activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and 
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays 
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those 
familiar with these devices.)  The arrays are set up with RAID5.  So I'm 
working with a consistent hardware platform for this comparison.  I'm 
only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all foreign 
keys in place.  With all foreign keys in place, PG took 54000 seconds to 
complete the same job.  I've tried various approaches to autovacuum 
(none, 30-seconds) and it doesn't seem to make much difference.  What 
does seem to make a difference is eliminating all the foreign keys; in 
that configuration, PG takes about 3 seconds.  Better, but BigDBMS 
still has it beat significantly.


I've got PG configured so that that the system database is on disk array 
2, as are the transaction log files.  The default table space for the 
test database is disk array 3.  I've got all the reference tables (the 
tables to which the foreign keys in the stats tables refer) on this 
array.  I also store the stats tables on this array.  Finally, I put the 
indexes for the stats tables on disk array 4.  I don't use disk array 1 
because I believe it is a software array.


I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.





--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] How much ram is too much

2007-06-08 Thread Guy Rouillier

Dave Cramer wrote:
Is it possible that providing 128G of ram is too much ? Will other 
systems in the server bottleneck ?


What CPU and OS are you considering?

--
Guy Rouillier

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] How much ram is too much

2007-06-08 Thread Guy Rouillier

Dave Cramer wrote:

It's an IBM x3850 using linux redhat 4.0


I had to look that up, web site says it is a 4-processor, dual-core (so 
8 cores) Intel Xeon system.  It also says Up to 64GB DDR II ECC 
memory, so are you sure you can even get 128 GB RAM?


If you could, I'd expect diminishing returns from the Xeon northbridge 
memory access.  If you are willing to spend that kind of money on 
memory, you'd be better off with Opteron or Sparc.


--
Guy Rouillier

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Guy Rouillier

Dave Cramer wrote:


On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote:


Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be 
hurt by this more than PG.  Here are the settings I've modified in 
postgresql.conf:
As I said you need to set shared_buffers to at least 750MB this is 
the starting point, it can actually go higher. Additionally effective 
cache should be set to 2.25 G turning fsync is not a real world 
situation. Additional tuning of file systems can provide some gain, 
however as Craig pointed out some queries may need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning 
fsync off is not a production option.  In another reply to my original 
posting, Alex mentioned that BigDBMS gets an advantage from its async 
IO.  So simply as a test, I turned fsync off in an attempt to open 
wide all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on 
this subject said that anything over 10,000 (8K buffers = 80 MB) had 
unproven benefits.  So I'm surprised to see such a large value 
suggested.  I'll certainly give it a try and see what happens.


That is 25% of your available memory. This is just a starting point. 
There are reports that going as high as 50% can be advantageous, however 
you need to measure it yourself.


Ok, I ran with the settings below, but with

shared_buffers=768MB
effective_cache_size=2048MB
fsync=on

This run took 29000 seconds.  I'm beginning to think configuration 
changes are not going to buy significant additional improvement.  Time 
to look at the app implementation.








autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org






--
Guy Rouillier

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Guy Rouillier

Ron wrote:
C= What file system are you using?  Unlike BigDBMS, pg does not have its 
own native one, so you have to choose the one that best suits your 
needs.  For update heavy applications involving lots of small updates 
jfs and XFS should both be seriously considered.


Ron, thanks for your ideas.  Many of them I've addressed in response to 
suggestions from others.  I wanted to address this one in particular. 
Unfortunately, I do not have the liberty to change file systems on this 
old Sun box.  All file systems are formatted Sun UFS.  BigDBMS is 
equally subject to whatever pluses or minuses can be attributed to this 
file system, so I'm thinking that this issue would be a wash between the 
two.


I've come to the conclusion that configuration changes to PG alone will 
not equal the playing field.  My next step is to try to determine where 
the biggest payback will be regarding changing the implementation.


--
Guy Rouillier

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Guy Rouillier

Craig A. James wrote:
I don't know if you have access to the application's SQL, or the time to 
experiment a bit, but unless your schema is trival and your SQL is 
boneheaded simple, you're not going to get equal performance from 
Postgres until you do some analysis of your application under real-world 
conditions, and optimize the problem areas.


Craig, thanks for taking the time to think about this.  Yes, I have all 
the application source code, and all the time in the world, as I'm doing 
this experimentation on my own time.  The test hardware is old stuff no 
one intends to use for production work ever again, so I can use it as 
long as I want.


The application is fairly straightforward, but as you say, what is 
working okay with BigDBMS isn't working as well under PG.  I'm going to 
try other configuration suggestions made by others before I attempt 
logic changes.  The core logic is unchangeable; millions of rows of data 
in a single table will be updated throughout the day.  If PG can't 
handle high volume updates well, this may be brick wall.


--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Guy Rouillier

Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be hurt 
by this more than PG.  Here are the settings I've modified in 
postgresql.conf:


As I said you need to set shared_buffers to at least 750MB this is the 
starting point, it can actually go higher. Additionally effective cache 
should be set to 2.25 G turning fsync is not a real world situation. 
Additional tuning of file systems can provide some gain, however as 
Craig pointed out some queries may need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning fsync 
off is not a production option.  In another reply to my original 
posting, Alex mentioned that BigDBMS gets an advantage from its async 
IO.  So simply as a test, I turned fsync off in an attempt to open wide 
all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on this 
subject said that anything over 10,000 (8K buffers = 80 MB) had unproven 
benefits.  So I'm surprised to see such a large value suggested.  I'll 
certainly give it a try and see what happens.




autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Guy Rouillier
I've got back access to my test system.  I ran another test run with the 
same input data set.  This time I put pg_xlog on a different RAID volume 
(the unused one that I suspect is a software RAID), and I turned 
fsync=off in postgresql.conf.  I left the rest of the configuration 
alone (all foreign keys removed), etc.  Unfortunately, this only dropped 
elapsed time down to about 28000 seconds (from 3), still 
significantly more than BigDBMS.  Additional info inline below.


Shoaib Mir wrote:

Here are my few recommendations that might help you:

- You will need to do table partitioning 
(http://www.postgresql.org/docs/current/static/ddl-partitioning.html 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html) 
as you are storing quite a lot of data in one table per day.


I'm focusing on the detailed perspective for now.  The 144 files I'm 
processing represent not even two hours of data, so that surely wouldn't 
be split up.




- You are using a RAID5 setup which is something that can also affect 
performance so switching to RAID1 might help you there, but again you 
have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.


Agreed.



- Have you done the tuning for postgresql.conf parameters? if not then 
you really need to do this for like checkpoint segments, random page 
cost, shared buffers, cache size, fsm pages, vacuum cost delay, 
work_mem, bgwriter etc etc. You can get good advice for tuning these 
parameters at -- http://www.powerpostgresql.com/PerfList/


The box has 3 GB of memory.  I would think that BigDBMS would be hurt by 
this more than PG.  Here are the settings I've modified in postgresql.conf:


autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'



- For autovacuuming you need to properly tune the thresholds so that the 
vacuum and analyze is done at the right time not affecting the database 
server performance. (You can find help for this at 
http://www.postgresql.org/docs/current/static/routine-vacuuming.html 
under 22.1.4. The auto-vacuum daemon)


The real-life load on this database would be fairly constant throughout 
the day.  Stats from network devices are received every 15 minutes from 
each device, but they are staggered.  As a result, the database is 
almost constantly being updated, so there is no dead time to do vacuums.




- You will need to separate your transactional logs i.e. pg_xlog folder 
to a different drive other then your database server drive. This can be 
done by creating symlinks for pg_xlog folder.


Done, see opening remarks.  Unfortunately minor impact.



- I hope you are doing proper connection pool management, because good 
use of database connections can be really effect the overall 
performance, connections can be expensive to create, and consume memory 
if they are not properly exited.


I probably should have mentioned this originally but was afraid of 
information overload.  The application runs on JBoss and uses JBoss 
connection pools.  So connections are pooled, but I don't know how they 
would compare to native PG connection pools.  Essentially, JBoss gets 
native JDBC connections, and the pools simply allow them to be re-used 
without opening and closing each time.  So if the native PG connection 
pools provide any pooling optimizations beyond that, those advantages 
are not being realized.




Hope that helps your tests...


Thanks to everyone for providing suggestions, and I apologize for my 
delay in responding to each of them.





Shoaib Mir
EnterpriseDB (www.enterprisedb.com http://www.enterprisedb.com)

On 12/28/06, *Guy Rouillier* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


I don't want to violate any license agreement by discussing
performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-30 Thread Guy Rouillier

Tom Lane wrote:

Shoaib Mir [EMAIL PROTECTED] writes:

Here are my few recommendations that might help you:
[ snip good advice ]


Another thing to look at is whether you are doing inserts/updates as
individual transactions, and if so see if you can batch them to
reduce the per-transaction overhead.


Thank you everyone who replied with suggestions.  Unfortunately, this is 
a background activity for me, so I can only work on it when I can 
squeeze in time.  Right now, I can't do anything; I swapped out a broken 
switch in our network and the DB server is currently inaccessible ;(.  I 
will eventually work through all suggestions, but I'll start with the 
ones I can respond to without further investigation.


I'm not doing updates as individual transactions.  I cannot use the Java 
batch functionality because the code uses stored procedures to do the 
inserts and updates, and the PG JDBC driver cannot handle executing 
stored procedures in batch.  Briefly, executing a stored procedure 
returns a result set, and Java batches don't expect result sets.


So, in the code I turn autocommit off, and do a commit every 100 
executions of the stored proc.  The exact same code is running against 
BigDBMS, so any penalty from this approach should be evenly felt.


--
Guy Rouillier

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-27 Thread Guy Rouillier
I don't want to violate any license agreement by discussing performance, 
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as 
BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with PostgreSQL 
for at least some of our Java applications.  As a proof of concept, I 
started with a high-volume (but conceptually simple) network data 
collection application.  This application collects files of 5-minute 
usage statistics from our network devices, and stores a raw form of 
these stats into one table and a normalized form into a second table. 
We are currently storing about 12 million rows a day in the normalized 
table, and each month we start new tables.  For the normalized data, the 
app inserts rows initialized to zero for the entire current day first 
thing in the morning, then throughout the day as stats are received, 
executes updates against existing rows.  So the app has very high update 
activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and 
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays 
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those 
familiar with these devices.)  The arrays are set up with RAID5.  So I'm 
working with a consistent hardware platform for this comparison.  I'm 
only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all foreign 
keys in place.  With all foreign keys in place, PG took 54000 seconds to 
complete the same job.  I've tried various approaches to autovacuum 
(none, 30-seconds) and it doesn't seem to make much difference.  What 
does seem to make a difference is eliminating all the foreign keys; in 
that configuration, PG takes about 3 seconds.  Better, but BigDBMS 
still has it beat significantly.


I've got PG configured so that that the system database is on disk array 
2, as are the transaction log files.  The default table space for the 
test database is disk array 3.  I've got all the reference tables (the 
tables to which the foreign keys in the stats tables refer) on this 
array.  I also store the stats tables on this array.  Finally, I put the 
indexes for the stats tables on disk array 4.  I don't use disk array 1 
because I believe it is a software array.


I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.


--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings