Re: [GENERAL] What's size of your PostgreSQL Database?

2008-09-10 Thread Amber
 8. We have a master and a replica.  We have plans to move to a
 cluster/grid Soon(TM).  It's not an emergency and Postgres can easily
 handle and scale to a 3TB database on reasonable hardware ($30k).
 

I'd like to know what's your progress of choosing the cluster/grid solution, we 
are also looking for
an appropriate one, following is the our major factors of the ideal solution.

1. Some kind of MPP.
2. No single point of failure.
3. Convenient and multiple access interfaces.

And following the is the solutions we have examined:

1. Slony-I: Not a MPP solution, and using triggers to detect changes, which 
defects performance.
2. pgpool-II: Some kind of MPP, but join operations can't be done on multiple 
machines parallelly, that is it can't scale out well.
3. Sequoia : The same problem as pgpool-II, and the major access interface is 
JDBC.
 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-09-10 Thread Amber
 Yahoo has a 2PB Postgres single instance Postgres database (modified
 engine), but the biggest pure Pg single instance I've heard of is 4TB.
 The 4TB database has the additional interesting property in that they've
 done none of the standard scalable architecture changes (such as
 partitioning, etc).  To me, this is really a shining example that even
 naive Postgres databases can scale to as much hardware as you're willing
 to throw at them.  Of course, clever solutions will get you much more
 bang for your hardware buck.

Can you share some ideas of the particular design of the 4T db, it sounds very 
interesting :) 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-09-10 Thread Joshua Drake
On Wed, 10 Sep 2008 23:17:40 +0800
Amber [EMAIL PROTECTED] wrote:
 
 1. Some kind of MPP.
 2. No single point of failure.
 3. Convenient and multiple access interfaces.
 
 And following the is the solutions we have examined:

http://www.greenplum.com/

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-09-10 Thread Amber
Yes, we know both Greenplum and Netezza are  PostgreSQL based MPP solutions, 
but they are commercial packages.
I'd like to know are there open source ones, and I would suggest the PostgreSQL 
Team to start a MPP version of PostgreSQL.

--
From: Joshua Drake [EMAIL PROTECTED]
Sent: Wednesday, September 10, 2008 11:27 PM
To: Amber [EMAIL PROTECTED]
Cc: Mark Roberts [EMAIL PROTECTED]; pgsql-general@postgresql.org
Subject: Re: [GENERAL] What's size of your PostgreSQL Database?

 On Wed, 10 Sep 2008 23:17:40 +0800
 Amber [EMAIL PROTECTED] wrote:
 
 1. Some kind of MPP.
 2. No single point of failure.
 3. Convenient and multiple access interfaces.
 
 And following the is the solutions we have examined:
 
 http://www.greenplum.com/
 
 Joshua D. Drake
 
 
 -- 
 The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
 PostgreSQL Community Conference: http://www.postgresqlconference.org/
 United States PostgreSQL Association: http://www.postgresql.us/
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 
 
 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-09-10 Thread Joshua Drake
On Wed, 10 Sep 2008 23:33:44 +0800
Amber [EMAIL PROTECTED] wrote:

 Yes, we know both Greenplum and Netezza are  PostgreSQL based MPP
 solutions, but they are commercial packages. I'd like to know are
 there open source ones, and I would suggest the PostgreSQL Team to
 start a MPP version of PostgreSQL.

To my knowledge there are no open source MPP versions of PostgreSQL,
further AFAIK MPP is not on the OSS PostgreSQL roadmap.

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-21 Thread Amber


 On Sat, 2008-08-16 at 11:42 +0800, Amber wrote: 
 Dear all:
 We are currently considering using PostgreSQL to host a read only 
 warehouse, we would like to get some experiences, best practices and 
 performance metrics from the user community, following is the question list:
 1. What's size of your database?
 2. What Operating System are you using?
 3. What level is your RAID array?
 4. How many cores and memory does your server have?
 5. What about your performance of join operations?
 6. What about your performance of load operations?
 7. How many concurrent readers of your database, and what's the average 
 transfer rate, suppose all readers are doing one table scaning.
 8. Single instance or a cluster, what cluster software are you using if you 
 have a cluster?
 
 Thank you in advance!
 
 1. 2.5-3TB, several others that are of fractional sisize.


   How many CPU cores and memory does your server have :) 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-21 Thread Amber
Another question, how many people are there maintaining this huge database.
We have about 2T of compressed SAS datasets, and now considering load them into 
a RDBMS database,
according to your experience, it seems a single PostgreSQL  instance can't 
manage such size databases well, it that right?

--
From: Amber [EMAIL PROTECTED]
Sent: Thursday, August 21, 2008 9:51 PM
To: Mark Roberts [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] What's size of your PostgreSQL Database?

 
 
 On Sat, 2008-08-16 at 11:42 +0800, Amber wrote: 
 Dear all:
 We are currently considering using PostgreSQL to host a read only 
 warehouse, we would like to get some experiences, best practices and 
 performance metrics from the user community, following is the question list:
 1. What's size of your database?
 2. What Operating System are you using?
 3. What level is your RAID array?
 4. How many cores and memory does your server have?
 5. What about your performance of join operations?
 6. What about your performance of load operations?
 7. How many concurrent readers of your database, and what's the average 
 transfer rate, suppose all readers are doing one table scaning.
 8. Single instance or a cluster, what cluster software are you using if you 
 have a cluster?
 
 Thank you in advance!
 
 1. 2.5-3TB, several others that are of fractional sisize.
 
 
   How many CPU cores and memory does your server have :) 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-21 Thread Mark Roberts

On Thu, 2008-08-21 at 22:17 +0800, Amber wrote:
 Another question, how many people are there maintaining this huge database.
 We have about 2T of compressed SAS datasets, and now considering load them 
 into a RDBMS database,
 according to your experience, it seems a single PostgreSQL  instance can't 
 manage such size databases well, it that right?

Yahoo has a 2PB Postgres single instance Postgres database (modified
engine), but the biggest pure Pg single instance I've heard of is 4TB.
The 4TB database has the additional interesting property in that they've
done none of the standard scalable architecture changes (such as
partitioning, etc).  To me, this is really a shining example that even
naive Postgres databases can scale to as much hardware as you're willing
to throw at them.  Of course, clever solutions will get you much more
bang for your hardware buck.

As for my personal experience, I'd say that the only reason that we're
currently running a dual Pg instance (Master/Replica/Hot Standby)
configuration is for report times.  It's really important to us to have
snappy access to our data warehouse.  During maintenance our site and
processes can easily be powered by the master database with some
noticeable performance degradation for the users.

The grid that we (I) am looking to build is coming out of changing
(yet ever static!) business needs: we're looking to immediately get 2x
the data volume and soon need to scale to 10x.  Couple this with
increased user load and the desire to make reports run even faster than
they currently do and we're really going to run up against a hardware
boundary.  Besides, writing grid/distributed databases is *fun*!

Uh, for a one sentence answer: A single Pg instance can absolutely
handle 2+ TB without flinching.

 How many CPU cores and memory does your server have :) 

My boss asked me not to answer the questions I missed... sorry.  I will
say that the hardware is pretty modest, but has good RAM and disk space.

-Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-20 Thread Robert Gravsjö



Mark Roberts wrote:

1. 2.5-3TB, several others that are of fractional sisize.


...


5. They do pretty well, actually.  Our aggregate fact tables regularly
join to metadata tables and we have an average query return time of
10-30s.  We do make some usage of denormalized mviews for
chained/hierarchical metadata tables.

   

Just out of curiosity, how do you replicate that amount of data?

...
   



A few notes: our database really can be broken into a very typical ETL
database: medium/high input (write) volume with low latency access
required.  I can provide a developer's view of what is necessary to keep
a database of this size running, but I'm under no illusion that it's
actually a large database.

I'd go into more details, but I'd hate to be rambling.  If anyone's
actually interested about any specific parts, feel free to ask. :)
   
I'd be very interested in a developers view of running and maintaining a 
database this size.
Mostly what choices is made during development that might have been 
different on a smaller database.
I'm also curious about the maintenance needed to keep a database this 
size healthy over time.


Regards,
/roppert

Also, if you feel that we're doing something wrong, feel free to
comment there too. :)

-Mark


   


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-20 Thread Mark Roberts

 Just out of curiosity, how do you replicate that amount of data?

When I started working here, we used Slony-I to replicate our aggregate
fact tables.  A little over a year ago our data volume had grown to the
point that the Slony was regularly unable to keep up with the data
volume and around this time Slony hit us with some crazy rollback based
data loss bug.  We elected to move our aggregate fact tables off of
slony, but left metadata tables on Slony.

So I wrote a custom in house replication engine into our aggregation
process.  Replication is accomplished in parallel via piped copy
statements, and so far it's been fast enough to keep up with the data
volume.  Without getting into too much detail, an import process might
look like this:

- Obtain and transform data into a usable state
- Import Data, and earmark it to be reaggregated
- Reaggregate (summarize) the changed data
- Copy the new data to all databases in the replication set
- Integrate new data into the aggregate fact tables

 I'd be very interested in a developers view of running and maintaining a 
 database this size.
 Mostly what choices is made during development that might have been 
 different on a smaller database.
 I'm also curious about the maintenance needed to keep a database this 
 size healthy over time.

Ok, so all of this comes from what might be termed as an ETL Developer
point of view.  I pretty much only work on the data warehouse's
import/summarization process and look into performance issues.

Design:
- Natural keys are pretty well strictly forbidden.  Don't make a VARCHAR
your primary key, please. ;-)
- The Data determines partition criteria and the application layer is
partition aware.
- It's important to have aggregate tables to support common queries.
Joins are ok, but repeatedly aggregating thousands of rows together on
the fly really takes too much time.
- Aggregation processes have to be designed with care.
- Parallel processing is important, especially if you ever have to
reprocess large amounts of data (for example due to incorrect initial
data)



Maintenance:
- Autovacuum might be more trouble than it's worth.  We frequently have
mysteriously hung queries that are eventually traced back to being
blocked by Autovacuum.  The application layer knows exactly what and
when is changing... it could *probably* take over this duty.
- Pg upgrades are a major PITA and require absurd amounts of downtime
for the data processing part of the warehouse.
- Queries that have been working for long periods of time and suddenly
stop working or hang are usually broken by statistics issues.
- Partitioning is important because it allows the aggregate tables to be
clustered, backed up, and archived individually



If anyone wants to chat with me or direct me to resources about running
Postgres on distributed file systems (or other grid computing solutions)
please let me know.  Yes, I'm aware of the proprietary solutions, but
they've been ruled out on account of performance problems with
aggregates and absurd licensing costs.  Also, any grid computing
solution that I write can easily be applied across the company and thus
save us (quite literally) millions per year.

Again, if you'd like more information about any particular topic, just
ask. :)


-Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-19 Thread David Wilson
On Fri, Aug 15, 2008 at 11:42 PM, Amber [EMAIL PROTECTED] wrote:
 Dear all:
We are currently considering using PostgreSQL to host a read only 
 warehouse, we would like to get some experiences, best practices and 
 performance metrics from the user community, following is the question list:

DB is ~650m rows across 10 tables and is currently around 160gb.

Running on Ubuntu, mostly because this db started out as a toy and it
was easy. It's done well enough thus far that it isn't worth the
hassle to replace it with anything else.

Currently only using Raid 0; the database can be regenerated from
scratch if necessary so we don't have to worry overmuch about disk
failures.

Machine is a quad-core Xeon 2.5 with 4g of RAM.

Our access pattern is a little odd; about half the database is wipe
and regenerated at approximately 1-2 month intervals (the regeneration
takes about 2 weeks); in between there's a nightly computation run
that creates a small amount of new data in two of the tables. Both the
regeneration and the addition of the new data depends very heavily on
many, many several table joins that generally involve about 50% of the
database at a time. We've been fairly pleased with the performance
overall, though it's taken some tweaking to get individual operations
to perform adequately.

I can't speak to pure load operations; all of our bulk-load style ops
are 4k-row COPY commands interspersed among a lot of big, complicated
aggregate queries- not exactly ideal from a cache perspective.

Concurrent readers are anywhere from 1-8, and we're not in a cluster.
Sequential transfer rate is usually a touch over 100mb/sec; we don't
have a lot of disks on this machine (though that may change oh how
some of our index scans long for more spindles).

The performance improvements made in the past few releases have been
incredibly helpful- and very much noticeable each time.

-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-19 Thread Bill Moran
In response to Ow Mun Heng [EMAIL PROTECTED]:

 On Mon, 2008-08-18 at 11:01 -0400, justin wrote:
  Ow Mun Heng wrote: 
   -Original Message-
   From: Scott Marlowe [EMAIL PROTECTED]
 
If you're looking at read only / read
mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
RAID 10 is my default choice unless testing shows RAID-5/6 can beat
it.

   
   I'm loading my slave server with RAID-0 based on 3 IDE 7200 Drives.
   Is this worst off than a RAID 5 implementation?
 
  I see no problem using Raid-0 on a purely read only database where
  there is a copy of the data somewhere else. RAID 0 gives performance.
  If one of the 3 drives dies it takes the server down and lost of data
  will happen.  The idea behind RAID 1/5/6/10  is  if a drive does fail
  the system can keep going.Giving you time to shut down and replace
  the bad disk or if you have hot swappable just pull and replace.
 
 I'm looking for purely read-only performance and since I didn't have the
 bandwidth to do extensive testing, I didn't know whether a RAID1 or a
 Raid 0 will do the better job. In the end, I decided to go with RAID 0
 and now, I'm thinking if RAID1 will do a better job.

When talking about pure read performance, the basic rule is the more
spindles you can have active simultaneously, the better.  By that rule,
RAID 0 is the best, but you have to balance that with reliability.  If
you have 10 disks in a RAID 0, the chance of the entire system going
down because of a disk failure is 10x that of a single disk system --
is that acceptable?

In theory, you can have so many disks that the bottleneck moves to some
other location, such as the IO bus or memory or the CPU, but I've never
heard of that happening to anyone.  Also, you want to get fast, high-
quality disks, as 10 15,000 RPM disks are going to perform better than
10 7,200 RPM disks.

Another solution is RAM, if you can get enough RAM in the system to hold
your working set of data, then the speed of the disk is not really
relevant.  Of course, that's tough to do if you've got 3TB of data,
which I don't know if that's your case or not.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-19 Thread Ow Mun Heng
On Tue, 2008-08-19 at 02:28 -0400, David Wilson wrote:
 On Fri, Aug 15, 2008 at 11:42 PM, Amber [EMAIL PROTECTED] wrote:
  Dear all:
 We are currently considering using PostgreSQL to host a read only 
  warehouse, 
 we would like to get some experiences, best practices and performance metrics 
 from the 
 user community, following is the question list:

I didn't realise the initial questions from this and since I'm lazy to look for 
the original 
mail, I'll put in my 2 cents worth.

DB is a DSS type store instead of OLTP type. Heavily denormalised data.

Master is a celeron 1.7Ghz, 768MB ram, 2x500GB 7200rpm IDE RAID1(data)+
1 spare, 1x80GB (system).
Slave is a celeron 1.7Ghz, 1.5GB RAM, 3x160GB 7200rpm IDE RAID1(data),
1x160GB system

Max columns ~120
DB size is ~200+GB ~600+M (denormalised) rows in ~60+ tables
(partitioned and otherwise)

vacuum is done nightly in addition to turning on autovacuum.

I'm both IO and CPU constrainted. :-)

Denormalisation/ETL process is done on the master and only the final
product is shipped to the slave for read-only via slony.

I've got close to 8 indexes on each table (for bitmap scanning) 

Due to the denormalisation, gettin to the data is very snappy even based
on such a small server. (adding ram to the slave saw drastic
performance improvement over the initial 512MB)

Currently looking for an FOSS implementation of a Slice and Dice kind of
drilldown for reporting purposes. Tried a variety including pentaho, but
never been able to get it set-up.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-19 Thread Mark Roberts

On Tue, 2008-08-19 at 07:34 -0400, Bill Moran wrote:
 
 In theory, you can have so many disks that the bottleneck moves to
 some
 other location, such as the IO bus or memory or the CPU, but I've
 never
 heard of that happening to anyone.  Also, you want to get fast, high-
 quality disks, as 10 15,000 RPM disks are going to perform better than
 10 7,200 RPM disks.

I've personally experienced this happening.

-Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-19 Thread Mark Roberts

On Sat, 2008-08-16 at 11:42 +0800, Amber wrote: 
 Dear all:
 We are currently considering using PostgreSQL to host a read only 
 warehouse, we would like to get some experiences, best practices and 
 performance metrics from the user community, following is the question list:
 1. What's size of your database?
 2. What Operating System are you using?
 3. What level is your RAID array?
 4. How many cores and memory does your server have?
 5. What about your performance of join operations?
 6. What about your performance of load operations?
 7. How many concurrent readers of your database, and what's the average 
 transfer rate, suppose all readers are doing one table scaning.
 8. Single instance or a cluster, what cluster software are you using if you 
 have a cluster?
 
 Thank you in advance!

1. 2.5-3TB, several others that are of fractional sisize.


...


5. They do pretty well, actually.  Our aggregate fact tables regularly
join to metadata tables and we have an average query return time of
10-30s.  We do make some usage of denormalized mviews for
chained/hierarchical metadata tables.

6. Load/copy operations are extremely performant.  We pretty well
constantly have 10+ concurrent load operations going with 2-3
aggregation processes.

7. About 50, but I'm not sure what the transfer rate is.

8. We have a master and a replica.  We have plans to move to a
cluster/grid Soon(TM).  It's not an emergency and Postgres can easily
handle and scale to a 3TB database on reasonable hardware ($30k).

A few notes: our database really can be broken into a very typical ETL
database: medium/high input (write) volume with low latency access
required.  I can provide a developer's view of what is necessary to keep
a database of this size running, but I'm under no illusion that it's
actually a large database.

I'd go into more details, but I'd hate to be rambling.  If anyone's
actually interested about any specific parts, feel free to ask. :)

Also, if you feel that we're doing something wrong, feel free to
comment there too. :)

-Mark


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-18 Thread Ow Mun Heng
-Original Message-
From: Scott Marlowe [EMAIL PROTECTED]
If you throw enough drives on a quality RAID controller at it you can
get very good throughput.  If you're looking at read only / read
mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
RAID 10 is my default choice unless testing shows RAID-5/6 can beat
it.

I'm loading my slave server with RAID-0 based on 3 IDE 7200 Drives.
Is this worst off than a RAID 5 implementation?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-18 Thread justin

Ow Mun Heng wrote:

-Original Message-
From: Scott Marlowe [EMAIL PROTECTED]
  

If you throw enough drives on a quality RAID controller at it you can
get very good throughput.  If you're looking at read only / read
mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
RAID 10 is my default choice unless testing shows RAID-5/6 can beat
it.



I'm loading my slave server with RAID-0 based on 3 IDE 7200 Drives.
Is this worst off than a RAID 5 implementation?


  
I see no problem using Raid-0 on a purely read only database where there 
is a copy of the data somewhere else. RAID 0 gives performance.  If one 
of the 3 drives dies it takes the server down and lost of data will 
happen.  The idea behind RAID 1/5/6/10  is  if a drive does fail the 
system can keep going.Giving you time to shut down and replace the 
bad disk or if you have hot swappable just pull and replace.  I just 
went through failed drives on Email server a few months ago.  This a 
case where i told the client the server is 5 years old time to replace 
it about 3 months latter i get a call the server is really slow.  It 
turned out 1 of the drives in the RAID 10 had failed.   The client 
allowed me to order a new server at that point. 





Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-18 Thread Ow Mun Heng
On Mon, 2008-08-18 at 11:01 -0400, justin wrote:
 Ow Mun Heng wrote: 
  -Original Message-
  From: Scott Marlowe [EMAIL PROTECTED]

   If you're looking at read only / read
   mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
   RAID 10 is my default choice unless testing shows RAID-5/6 can beat
   it.
   
  
  I'm loading my slave server with RAID-0 based on 3 IDE 7200 Drives.
  Is this worst off than a RAID 5 implementation?
  
  

 I see no problem using Raid-0 on a purely read only database where
 there is a copy of the data somewhere else. RAID 0 gives performance.
 If one of the 3 drives dies it takes the server down and lost of data
 will happen.  The idea behind RAID 1/5/6/10  is  if a drive does fail
 the system can keep going.Giving you time to shut down and replace
 the bad disk or if you have hot swappable just pull and replace.

I'm looking for purely read-only performance and since I didn't have the
bandwidth to do extensive testing, I didn't know whether a RAID1 or a
Raid 0 will do the better job. In the end, I decided to go with RAID 0
and now, I'm thinking if RAID1 will do a better job.

  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-18 Thread Brent Wood
I have a db (tables with up to 5,000,000 records, up to 70 columns x 1,500,000 
records, around 50Gb of disk space  for the database (incl data, indexes, etc)

Most records have PostGIS geometry columns, which work very well.

For read performance this is on a (2 yr old) Linux box with 2x software RAID 0 
(striped) WD 10,000RPM  Raptor drives.

FWIW bonnie gives reads at about 150Mb/sec from the filesystem. We have been 
more than happy with performance.
though the 4Gb of RAM helps

For data security, pg_dump backs it up every second day onto another 250Gb 
drive on the box,  this is copied over the LAN to another server which is 
backed up to tape every day. 

It works for us :-)


Cheers,

  Brent Wood



 Ow Mun Heng [EMAIL PROTECTED] 08/19/08 4:00 PM 
On Mon, 2008-08-18 at 11:01 -0400, justin wrote:
 Ow Mun Heng wrote: 
  -Original Message-
  From: Scott Marlowe [EMAIL PROTECTED]

   If you're looking at read only / read
   mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
   RAID 10 is my default choice unless testing shows RAID-5/6 can beat
   it.
   
  
  I'm loading my slave server with RAID-0 based on 3 IDE 7200 Drives.
  Is this worst off than a RAID 5 implementation?
  
  

 I see no problem using Raid-0 on a purely read only database where
 there is a copy of the data somewhere else. RAID 0 gives performance.
 If one of the 3 drives dies it takes the server down and lost of data
 will happen.  The idea behind RAID 1/5/6/10  is  if a drive does fail
 the system can keep going.Giving you time to shut down and replace
 the bad disk or if you have hot swappable just pull and replace.

I'm looking for purely read-only performance and since I didn't have the
bandwidth to do extensive testing, I didn't know whether a RAID1 or a
Raid 0 will do the better job. In the end, I decided to go with RAID 0
and now, I'm thinking if RAID1 will do a better job.

  

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-17 Thread Amber
 7. How many concurrent readers of your database, and what's the average 
 transfer rate, suppose all readers are doing one table scaning.
 
 Concurrent but idle connections in production are around 600.  Active
 connections at a time are in the dozens.  I can read at about 60 to 70
 Megs a second for random access and around 350 to 400 Megs a second
 for sequential reads.
 


I am not so familiar with PostgreSQL, it uses a one process per connection 
architecture, let me say one agent process per client , what I am wondering is 
how multiple agent process share page caches. In many other databases, client 
agents uses the multiple thread method, so they can share memory buffers within 
the same process, are there share-memory mechanisms between PostgreSQL agent 
processes?  
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-17 Thread Greg Smith

On Sun, 17 Aug 2008, Amber wrote:


what I am wondering is how multiple agent process share page caches.


The database allocates a block of shared memory (sized by the 
shared_buffers parameter) that all the client processes share for caching 
pages.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] What's size of your PostgreSQL Database?

2008-08-16 Thread Scott Marlowe
On Fri, Aug 15, 2008 at 9:42 PM, Amber [EMAIL PROTECTED] wrote:
 Dear all:
We are currently considering using PostgreSQL to host a read only 
 warehouse, we would like to get some experiences, best practices and 
 performance metrics from the user community, following is the question list:
 1. What's size of your database?

Varies.  I've had reporting dbs in the low 100s of gigabytes.

 2. What Operating System are you using?

I've generally worked with Linux.  RHEL, Centos, or Ubuntu.

 3. What level is your RAID array?

For transactional, ALWAYS RAID 10.  For reporting sometimes RAID-5,
mostly RAID-10
The reporting server I built at my last company was a collection of
spare parts and ran a software RAID-10 over 4 150G sata drives.  It
routinely outran the Oracle RAC cluster with 14 drives in RAID 6
sitting next to it doing reports on the same data.

 4. How many cores and memory does your server have?

The reporting server from my last company had a single hyperthreaded
P4 and 4 Gig of ram.
Current transactional server runs on 8 opterons, with 32 Gigs of ram.

 5. What about your performance of join operations?

Always been pretty good.  Kind of a wide open question really.  I'd
say PostgreSQL's query planner is usually very smart planning complex
queries.  note that joins were never an issue, but I had to pay
attention to how I designed correlated subqueries and aggregate
queries.

 6. What about your performance of load operations?

Pretty much dependent on the hardware you're on.  I can replicate the
current ~15 Gig transactional db in about 15 or 20 minutes from one 8
core 16 drive machine to another.

 7. How many concurrent readers of your database, and what's the average 
 transfer rate, suppose all readers are doing one table scaning.

Concurrent but idle connections in production are around 600.  Active
connections at a time are in the dozens.  I can read at about 60 to 70
Megs a second for random access and around 350 to 400 Megs a second
for sequential reads.

 8. Single instance or a cluster, what cluster software are you using if you 
 have a cluster?

Two machines with one as slony master and the other as slony slave,
with the application doing weighted load balancing on reads between
the two.

The important thing about pgsql is how well it scales to work on
larger hardware.

If you throw enough drives on a quality RAID controller at it you can
get very good throughput.  If you're looking at read only / read
mostly, then RAID5 or 6 might be a better choice than RAID-10.  But
RAID 10 is my default choice unless testing shows RAID-5/6 can beat
it.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] What's size of your PostgreSQL Database?

2008-08-15 Thread Amber
Dear all:
We are currently considering using PostgreSQL to host a read only 
warehouse, we would like to get some experiences, best practices and 
performance metrics from the user community, following is the question list:
1. What's size of your database?
2. What Operating System are you using?
3. What level is your RAID array?
4. How many cores and memory does your server have?
5. What about your performance of join operations?
6. What about your performance of load operations?
7. How many concurrent readers of your database, and what's the average 
transfer rate, suppose all readers are doing one table scaning.
8. Single instance or a cluster, what cluster software are you using if you 
have a cluster?

Thank you in advance!

 
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general