Re: [PERFORM] Postgresql and Software RAID/LVM

2005-06-07 Thread Marty Scholes

John A Meinel wrote:


Isn't this actually more of a problem for the meta-data to give out in a
hardware situation? I mean, if the card you are using dies, you can't
just get another one.
With software raid, because the meta-data is on the drives, you can pull
it out of that machine, and put it into any machine that has a
controller which can read the drives, and a similar kernel, and you are
back up and running.


Probably true.  If you have a similar kernel and hardware and if you can 
recover the state information, knowing where the state information is 
stored.  Those are some very big "ifs" during a hectic disaster.



No, it hedges against *more* than one failure. But you can also do a
RAID1 over a RAID5 in software. But if you are honestly willing to
create a full RAID1, just create a RAID1 over RAID0. The performance is
much better. And since you have a full RAID1, as long as both drives of
a pairing don't give out, you can lose half of your drives.


True as well.  The problem with RAID1 over RAID0 is that, during a drive 
failure, you are one bad sector from disaster.  Further, RAID5 does 
automatic rebuild, whereas most RAID1 setups do not.  RAID5 reduces the 
amount of time that things are degraded, reducing the time that your 
data is in danger.



If you want the space, but you feel that RAID5 isn't redundant enough,
go to RAID6, which uses 2 parity locations, each with a different method
of storing parity, so not only is it more redundant, you have a better
chance of finding problems.


Agreed, RAID6 is the future, but still won't keep the server running 
when the RAID controller dies, or the SCSI/FC host adapter goes, or you 
want to upgrade controller firmware, or you want to replace the media, or...



So you are saying that you were able to replace the RAID controller
without turning off the machine? I realize there does exist
hot-swappable PCI cards, but I think you are overstating what you mean
by "fully operational". For instance, it's not like you can access your
data while it is being physically moved.


Detach mirror 1, uncable and move, recable and resync.  Detach mirror 2, 
uncable and move, recable and resync.




I do think you had some nice hardware. But I know you can do all of this
in software as well. It is usually a price/performance tradeoff. You
spend quite a bit to get a hardware RAID card that can keep up with a
modern CPU. I know we have an FC raid box at work which has a full 512MB
of cache on it, but it wasn't that much cheaper than buying a dedicated
server.


We run two Nexsan ATABoy2 arrays.  These can be found in 1 TB 
configurations for about $3,000 each, putting mirrored RAID5 storage at 
$6 per GB.  Is that a lot of money for storage?  Maybe.  In our case, 
that's dirt cheap protection against storage-related downtime.


Marty


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Postgresql and Software RAID/LVM

2005-06-06 Thread Marty Scholes

> Has anyone ran Postgres with software RAID or LVM on a production box?
> What have been your experience?

Yes, we have run for a couple years Pg with software LVM (mirroring) 
against two hardware RAID5 arrays.  We host a production Sun box that 
runs 24/7.


My experience:
* Software RAID (other than mirroring) is a disaster waiting to happen. 
 If the metadata for the RAID set gives out for any reason (CMOS 
scrambles, card dies, power spike, etc.) then you are hosed beyond 
belief.  In most cases it is almost impossible to recover.  With 
mirroring, however, you can always boot and operate on a single mirror, 
pretending that no LVM/RAID is underway.  In other words, each mirror is 
a fully functional copy of the data which will operate your server.


* Hardware RAID5 is a terrific way to boost performance via write 
caching and spreading I/O across multiple spindles.  Each of our 
external arrays operates 14 drives (12 data, 1 parity and 1 hot spare). 
 While RAID5 protects against single spindle failure, it will not hedge 
against multiple failures in a short time period, SCSI contoller 
failure, SCSI cable problems or even wholesale failure of the RAID 
controller.  All of these things happen in a 24/7 operation.  Using 
software RAID1 against the hardware RAID5 arrays hedges against any 
single failure.


* Software mirroring gives you tremendous ability to change the system 
while it is running, by taking offline the mirror you wish to change and 
then synchronizing it after the change.


On a fully operational production server, we have:
* restriped the RAID5 array
* replaced all RAID5 media with higher capacity drives
* upgraded RAID5 controller
* moved all data from an old RAID5 array to a newer one
* replaced host SCSI controller
* uncabled and physically moved storage to a different part of data center

Again, all of this has taken place (over the years) while our machine 
was fully operational.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-31 Thread Marty Scholes
> Tell me if I am wrong but it sounds to me like like
> an endless problem
Agreed.  Such it is with caching.  After doing some informal 
benchmarking with 8.0 under Solaris, I am convinced that our major choke 
point is WAL synchronization, at least for applications with a high 
commit rate.

We have noticed a substantial improvement in performance with 8.0 vs 
7.4.6.  All of the update/insert problems seem to have gone away, save 
WAL syncing.

I may have to take back what I said about indexes.
Olivier Sirven wrote:
Le Vendredi 21 Janvier 2005 19:18, Marty Scholes a écrit :
The indexes can be put on a RAM disk tablespace and that's the end of
index problems -- just make sure you have enough memory available.  Also
make sure that the machine can restart correctly after a crash: the
tablespace is dropped and recreated, along with the indexes.  This will
cause a machine restart to take some time.
Tell me if I am wrong but it sounds to me like like an endless problemThis 
solution may work with small indexes (less than 4GB) but what appends when 
the indexes grow ? You would add more memory to your server ? But there will 
be a moment were you can not add more so what's next ?


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-24 Thread Marty Scholes
Tatsuo,
I agree completely that vacuum falls apart on huge tables.  We could 
probably do the math and figure out what the ratio of updated rows per 
total rows is each day, but on a constantly growing table, that ratio 
gets smaller and smaller, making the impact of dead tuples in the table 
proportionately less and less.

If multi-version indexes are handled the same way as table rows, then 
the indexes will also suffer the same fate, if not worse.  For huge 
tables, the b-tree depth can get fairly large.  When a b-tree is of 
depth X and the machine holds the first Y levels of the b-tree in 
memory, then each table row selected requires a MINIMUM of (X-Y) disk 
access *before* the table row is accessed.  Substitute any numbers you 
want for X and Y, but you will find that huge tables require many index 
reads.

Index updates are even worse.  A table row update requires only a copy 
of the row.  An index update requires at least a copy of the leaf node, 
and possibly more nodes if nodes must be split or collapsed.  These 
splits and collapses can cascade, causing many nodes to be affected.

This whole process takes place for each and every index affected by the 
change, which is every index on the table when a row is added or 
deleted.  All of this monkeying around takes place above and beyond the 
simple change of the row data.  Further, each and every affected index 
page is dumped to WAL.

Assuming the indexes have the same MVCC proprties of row data, then the 
indexes would get dead tuples at a rate far higher than that of the 
table data.

So yes, vacuuming is a problem on large tables.  It is a bigger problem 
for indexes.  On large tables, index I/O comprises most of the I/O mix.

Don't take my word for it.  Run a benchmark on Pg.  Then, soft-link the 
index files and the WAL directories to a RAM disk.  Rerun the benchmark 
and you will find that Pg far faster, much faster than if only the data 
were on the RAM disk.

Marty
Tatsuo Ishii wrote:
IMO the bottle neck is not WAL but table/index bloat. Lots of updates
on large tables will produce lots of dead tuples. Problem is, There'
is no effective way to reuse these dead tuples since VACUUM on huge
tables takes longer time. 8.0 adds new vacuum delay
paramters. Unfortunately this does not help. It just make the
execution time of VACUUM longer, that means more and more dead tuples
are being made while updating.
Probably VACUUM works well for small to medium size tables, but not
for huge ones. I'm considering about to implement "on the spot
salvaging dead tuples".
--
Tatsuo Ishii

This is probably a lot easier than you would think.  You say that your 
DB will have lots of data, lots of updates and lots of reads.

Very likely the disk bottleneck is mostly index reads and writes, with 
some critical WAL fsync() calls.  In the grand scheme of things, the 
actual data is likely not accessed very often.

The indexes can be put on a RAM disk tablespace and that's the end of 
index problems -- just make sure you have enough memory available.  Also 
make sure that the machine can restart correctly after a crash: the 
tablespace is dropped and recreated, along with the indexes.  This will 
cause a machine restart to take some time.

After that, if the WAL fsync() calls are becoming a problem, put the WAL 
files on a fast RAID array, etiher a card or external enclosure, that 
has a good amount of battery-backed write cache.  This way, the WAL 
fsync() calls will flush quickly to the RAM and Pg can move on while the 
RAID controller worries about putting the data to disk.  With WAL, low 
access time is usually more important than total throughput.

The truth is that you could have this running for not much money.
Good Luck,
Marty

Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
> Could you explain us what do you have in mind for that solution? I mean,
> forget the PostgreSQL (or any other database) restrictions and 
explain us
> how this hardware would be. Where the data would be stored?
>
> I've something in mind for you, but first I need to understand your 
needs!

I just want to make a big database as explained in my first mail ... At the
beginning we will have aprox. 150 000 000 records ... each month we will 
add
about 4/8 millions new rows in constant flow during the day ... and in same
time web users will access to the database in order to read those data.
Stored data are quite close to data stored by google ... (we are not 
making a
google clone ... just a lot of data many small values and some big ones ...
that's why I'm comparing with google for data storage).
Then we will have a search engine searching into those data ...

Dealing about the hardware, for the moment we have only a bi-pentium Xeon
2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results 
... so
we are thinking about a new solution with maybe several servers (server
design may vary from one to other) ... to get a kind of cluster to get 
better
performance ...

Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-21 Thread Marty Scholes
Randolf,
You probably won't want to hear this, but this decision likely has 
nothing to do with brands, models, performance or applications.

You are up against a pro salesman who is likely very good at what he 
does.  Instead spewing all sorts of "facts" and statistics to your 
client, the salesman is probably trying to figure out what is driving 
your client.  Do you know what is driving your client?  Why does he want 
to switch?  Why now?  Why not next quarter?  Why not last quarter?  Why 
does he want to do the application at all?

Forget the expected answers, e.g., "We need this application to enhance 
our competitiveness in the marketplace and increase the blah blah blah."

Why does YOUR CLIENT actually care about any of this?  Is he trying to 
impress his boss?  Build his career?  Demonstrate that he can manage a 
significant project?  Is he trying to get rid of old code from an 
ex-coworker that he hated?  Is it spite?  Pride?  Is he angling for a 
bigger budget next year?  Is there someone who will be assigned to this 
project that your client wants to lord over?

The list goes on and on, and there is no way that your client is going 
to admit the truth and say something like, "The real reason I want to do 
this right now is that my childhood rival at XYZ corp just did a project 
like this.  I need to boost my ego, so I *MUST* do a bigger project, 
right now."

You gotta read between the lines.  How important is this and why?  How 
urgent and why?  Who all is behind this project?  What are each 
individual's personal motivations?  Does anyone resent a leader on the 
team and secretly wish for this project to fail?

Once you know what is actually going on in people's heads, you can begin 
to build rapport and influence them.  You can establish your own 
credibility and safety with your solution, while planting seeds of doubt 
about another solution.

At its core, this decision is (very likely) not at all about RDBMS 
performance or anything else related to computing.

Have you asked yourself why you care about one solution over another? 
What's driving you to push Pg over MS?  Why?  You might want to start 
answering those questions before you even talk to your client.

Good Luck,
Marty
Randolf Richardson wrote:
 I'm looking for recent performance statistics on PostgreSQL vs. Oracle
vs. Microsoft SQL Server.  Recently someone has been trying to convince my
client to switch from SyBASE to Microsoft SQL Server (they originally 
wanted
to go with Oracle but have since fallen in love with Microsoft).  All this
time I've been recommending PostgreSQL for cost and stability (my own 
testing
has shown it to be better at handling abnormal shutdowns and using fewer
system resources) in addition to true cross-platform compatibility.

If I can show my client some statistics that PostgreSQL outperforms
these (I'm more concerned about it beating Oracle because I know that
Microsoft's stuff is always slower, but I need the information anyway to
protect my client from falling victim to a 'sales job'), then PostgreSQL 
will
be the solution of choice as the client has always believed that they 
need a
high-performance solution.

I've already convinced them on the usual price, cross-platform
compatibility, open source, long history, etc. points, and I've been 
assured
that if the performance is the same or better than Oracle's and Microsoft's
solutions that PostgreSQL is what they'll choose.

Thanks in advance.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Marty Scholes
This is probably a lot easier than you would think.  You say that your 
DB will have lots of data, lots of updates and lots of reads.

Very likely the disk bottleneck is mostly index reads and writes, with 
some critical WAL fsync() calls.  In the grand scheme of things, the 
actual data is likely not accessed very often.

The indexes can be put on a RAM disk tablespace and that's the end of 
index problems -- just make sure you have enough memory available.  Also 
make sure that the machine can restart correctly after a crash: the 
tablespace is dropped and recreated, along with the indexes.  This will 
cause a machine restart to take some time.

After that, if the WAL fsync() calls are becoming a problem, put the WAL 
files on a fast RAID array, etiher a card or external enclosure, that 
has a good amount of battery-backed write cache.  This way, the WAL 
fsync() calls will flush quickly to the RAM and Pg can move on while the 
RAID controller worries about putting the data to disk.  With WAL, low 
access time is usually more important than total throughput.

The truth is that you could have this running for not much money.
Good Luck,
Marty
Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
 > Could you explain us what do you have in mind for that solution? I mean,
 > forget the PostgreSQL (or any other database) restrictions and 
explain us
 > how this hardware would be. Where the data would be stored?
 >
 > I've something in mind for you, but first I need to understand your 
needs!

I just want to make a big database as explained in my first mail ... At the
beginning we will have aprox. 150 000 000 records ... each month we will 
add
about 4/8 millions new rows in constant flow during the day ... and in same
time web users will access to the database in order to read those data.
Stored data are quite close to data stored by google ... (we are not 
making a
google clone ... just a lot of data many small values and some big ones ...
that's why I'm comparing with google for data storage).
Then we will have a search engine searching into those data ...

Dealing about the hardware, for the moment we have only a bi-pentium Xeon
2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results 
... so
we are thinking about a new solution with maybe several servers (server
design may vary from one to other) ... to get a kind of cluster to get 
better
performance ...

Am I clear ?
Regards,


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Disk performance, was Re: tablespaces and DB administration

2004-06-01 Thread Marty Scholes
This was a lively debate on what was faster, single spindles or RAID.
This is important, because I keep running into people who do not 
understand the performance dynamics of a RDBMS like Oracle or Pg.

Pg and Oracle make a zillion tiny reads and writes and fsync() 
regularly.  If your drive will copy a 4GB file at a sustained rate of 72 
MB/s, that tells you nothing about how it will do with an RDBMS.

I will throw in my experience on RAID vs spindles.
With the RAID write cache disabled, a well balanced set of spindles will 
kill a RAID system any day.

Enable the cache, and the RAID starts inching ahead.  My experience is 
that no one can continuously keep I/O properly balanced across several 
spindles on a production system.  Things change and the I/O mix changes. 
 Then, the RAID is outperforming the spindles.  If you want to spend 
the rest of your career constantly balancing I/O across spindles, then 
do so.

For the rest of us, with a write cache, a hardware RAID wins hands down 
over the long haul.

It might make sense to provide some sort of benchmarking tool for 
various systems so that we can predict I/O performance.

Run the following code both on a hardware RAID and on a single spindle.
#include 
#include 
#include 
#include 
#include 
#include 
void makefile(int fs)
{
int i;
charbuf[8192];
int ld;
int blocks=4096;
int pos;
time_t  stim;
time_t  etim;
float   avg;
unlink("dump.out");
ld=open("dump.out", O_WRONLY | O_CREAT);
printf("Writing %d blocks sequentially\n", blocks);
time(&stim);
for (i=0; i
// purge the write cache
fsync(ld);
printf("Writing %d blocks (somewhat randomly)\n", blocks);
time(&stim);
for (i=0; i
close(ld);
unlink("dump.out");
}
int main()
{
printf("No fsync()\n");
makefile(0);
printf("With fsync()\n");
makefile(1);
return 0;
}
The first operation shows how well the OS write cache is doing.  The 
second shows how poorly everything runs with fsync(), which is what Pg 
and Oracle do.

My RAID produced the following, but was also running production when I 
ran it:

No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.00 iops
Writing 4096 blocks (somewhat randomly)
Took 4 seconds, avg 1024.00 iops
With fsync()
Writing 4096 blocks sequentially
Took 40 seconds, avg 102.42 iops
Writing 4096 blocks (somewhat randomly)
Took 66 seconds, avg 62.060608 iops
When I ran this on a decent fibre channel drive, I got:
No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.00 iops
Writing 4096 blocks (somewhat randomly)
Took 7 seconds, avg 585.142883 iops
With fsync()
Writing 4096 blocks sequentially
Took 106 seconds, avg 38.641510 iops
Writing 4096 blocks (somewhat randomly)
Took 115 seconds, avg 35.617390 iops
You can see that the RAID array really helps out with small writes.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL caching

2004-05-25 Thread Marty Scholes
Vitaly,
This looks like there might be some room for performance improvement...
> MS> I didn't see the table structure, but I assume
> MS> that the vote_avg and
> MS> vote_count fields are in bv_bookgenres.
>
> I didn't understand you. vote_avg is stored in bv_books.
Ok.  That helps.  The confusion (on my end) came from the SELECT clause 
of the query you provided:

> SELECT bv_books. * ,
>vote_avg,
>vote_count
All fields from bv_books were selected (bv_books.*) along with vote_agv 
and vote_count.  My assumption was that vote_avg and vote_count were 
therefore not in bv_books.

At any rate, a query with an IN clause should help quite a bit:
SELECT bv_books. *
FROM   bv_books
WHERE  bv_books.book_id IN (
  SELECT book_id
  FROM bv_genres
  WHERE bv_bookgenres.genre_id = 5830
  )
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
Give it a whirl.
Marty
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL caching

2004-05-24 Thread Marty Scholes
> Hello Marty,
>
> MS> Is that a composite index?
>
> It is a regular btree index. What is a composite index?
My apologies.  A composite index is one that consists of multiple fields 
(aka multicolumn index).  The reason I ask is that it was spending 
almost half the time just searching bv_bookgenres, which seemed odd.

I may be speaking out of turn since I am not overly familiar with Pg's 
quirks and internals.

A composite index, or any index of a large field, will lower the number 
of index items stored per btree node, thereby lowering the branching 
factor and increasing the tree depth.  On tables with many rows, this 
can result in many more disk accesses for reading the index.  An index 
btree that is 6 levels deep will require at least seven disk accesses (6 
for the index, one for the table row) per row retrieved.

Not knowing the structure of the indexes, it's hard to say too much 
about it.  The fact that a 1993 row select from an indexed table took 
3.5 seconds caused me to take notice.

> MS> I would be curious to see how it performs with an "IN" clause,
> MS> which I would suspect would go quite a bit fasrer.
>
> Actually it reached 20s before I canceled it... Here's the explain:
I believe that.  The code I posted had a nasty join bug.  If my math is 
right, the query was trying to return 1993*1993, or just under 4 million 
rows.

I didn't see the table structure, but I assume that the vote_avg and 
vote_count fields are in bv_bookgenres.  If no fields are actually 
needed from bv_bookgenres, then the query might be constructed in a way 
that only the index would be read, without loading any row data.

I think that you mentioned this was for a web app.  Do you actually have 
a web page that displays 2000 rows of data?

Good luck,
Marty
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Marty Scholes
Not knowing a whole lot about the internals of Pg, one thing jumped out 
at me, that each trip to get data from bv_books took 2.137 ms, which 
came to over 4.2 seconds right there.

The problem "seems" to be the 1993 times that the nested loop spins, as 
almost all of the time is spent there.

Personally, I am amazed that it takes 3.585 seconds to index scan 
i_bookgenres_genre_id.  Is that a composite index?  Analyzing the 
taables may help, as the optimizer appears to mispredict the number of 
rows returned.

I would be curious to see how it performs with an "IN" clause, which I 
would suspect would go quite a bit fasrer.  Try the following:

SELECT bv_books. * ,
   vote_avg,
   vote_count
FROM   bv_bookgenres,
   bv_books
WHERE  bv_books.book_id IN (
  SELECT book_id
  FROM bv_genres
  WHERE bv_bookgenres.genre_id = 5830
  )
AND bv_bookgenres.genre_id = 5830
ORDER BY   vote_avg DESC LIMIT 10 OFFSET 0;
In this query, all of the book_id values are pulled at once.
Who knows?
If you get statisctics on this, please post.
Marty
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware Platform

2004-05-20 Thread Marty Scholes
Duane wrote:
> P.S. I've only just begun using PostgreSQL after having
> used (and still using) DB2 on a mainframe for the past 14
> years.  My experience with Unix/Linux is limited to some
> community college classes I've taken but we do have
> a couple of experienced Linux sysadmins on our team.
> I tell you this because my "ignorance" will probably
> show more than once in my inquiries.
Duane,
If you've been actively using and developing in DB2, presumably under 
MVS or whatever big blue is calling it these days, for 14 years, then 
you will bring a wealth of big system expertise to Pg.

Please stay involved and make suggestions where you thing Pg could be 
improved.

Marty
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Quad processor options

2004-05-18 Thread Marty Scholes
After reading the replies to this, it is clear that this is a 
Lintel-centric question, but I will throw in my experience.

> I am curious if there are any real life production
> quad processor setups running postgresql out there.
Yes.  We are running a 24/7 operation on a quad CPU Sun V880.
> Since postgresql lacks a proper replication/cluster
> solution, we have to buy a bigger machine.
This was a compelling reason for us to stick with SPARC and avoid 
Intel/AMD when picking a DB server.  We moved off of an IBM mainframe in 
1993 to Sun gear and never looked back.  We can upgrade to our heart's 
content with minimal disruption and are only on our third box in 11 
years with plenty of life left in our current one.

> Right now we are running on a dual 2.4 Xeon, 3 GB Ram
> and U160 SCSI hardware-raid 10.
A couple people mentioned hardware RAID, which I completely agree with. 
 I prefer an external box with a SCSI or FC connector.  There are no 
driver issues that way.  We boot from our arrays.

The Nexsan ATABoy2 is a nice blend of performance, reliability and cost. 
 Some of these with 1TB and 2TB of space were recently spotted on ebay 
for under $5k.  We run a VERY random i/o mix on ours and it will 
consistently sustain 15 MB/s in blended read and write i/o, sustaining 
well over 1200 io/s.  These are IDE drives, so they fail more often than 
SCSI, so run RAID1 or RAID5.  The cache on these pretty much eliminates 
the RAID5 penalties.

> The 30k+ setups from Dell etc. don't fit our budget.
For that kind of money you could get a lower end Sun box (or IBM RS/6000 
I would imagine) and give yourself an astounding amount of headroom for 
future growth.

Sincerely,
Marty
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Scaling further up

2004-03-13 Thread Marty Scholes
I have some suggestions based on my anecdotal experience.

1. This is a relatively small DB -- the working set will likely be in 
RAM at any moment in time, making read I/O time mostly irrelevant.

2. The killer will be write times -- specifically log writes.  Small and 
heavily synchronized writes, log and data writes, will drag down an 
impressive hardware RAID setup.  We run mirrored hardware RAID 5 arrays 
with write back cache and are constantly seeking ways to improve write 
performance.  We do a lot of batch processing, though, so we do a lot of 
write I/Os.

3. Be very careful with "battery backed write cache."  It usually works 
as advertised.  More than once in the past decade I have seen 
spontaneous cache corruption after power losss.  The corruption usually 
happens when some admin, including me, has assumed that the cache will 
ALWAYS survive a power failure unblemished and has no "plan B."  Make 
sure you have a contingency plan for corruption, or don't enable the cache.

4. RAID 10 will likely have bigger stripe sizes on the RAID 0 portion of 
the setup, and might hinder, not help small write I/O performance.

5. Most (almost all) of the I/O time will be due to the access time 
(head seek + head settle + rotational latency) and very little of the 
I/O time will due to data transfer time.  In other words, getting drives 
that provide faster transfer rates will barely improve performance.  The 
secret is lowering the access time.

6. A relatively cheap way to drastically drop the access time is to get 
large drive(s) and only use a portion of them for storage.  The less 
space used on the drive, the less area the heads need to cover for 
seeks.  At one extreme, you could make the partition the size of a 
single cylinder.  This would make access time (ignoring OS and 
controller overhead) identical to rotational latency, which is as low as 
4.2 ms for a cheap 7200 RPM drive.

7. A drive with a 5 ms average service time, servicing 8 KB blocks, will 
yield as much as 1.6 MB/s sustained write throughput.  Not bad for a 
cheap uncached solution.  Any OS aggregation of writes during the 
fsync() call will further improve this number -- it is basically a lower 
bound for throughput.

8. Many people, especially managers, cannot stomach buying disk space 
and only using a portion of it.  In many cases, it seems more palatable 
to purchase a much more expensive solution to get to the same speeds.

Good luck.

scott.marlowe wrote:
On Wed, 3 Mar 2004, Paul Thomas wrote:

 >
 > On 02/03/2004 23:25 johnn wrote:
 > > [snip]
 > > random_page_cost should be set with the following things taken into
 > > account:
 > >   - seek speed
 >
 > Which is not exactly the same thing as spindle speed as it's a 
combination
 > of spindle speed and track-to-track speed. I think you'll find that a 
15K
 > rpm disk, whilst it will probably have a lower seek time than a 10K rpm
 > disk, won't have a proportionately (i.e., 2/3rds) lower seek time.

There are three factors that affect how fast you can get to the next
sector:
seek time
settle time
rotational latency
Most drives only list the first, and don't bother to mention the other
two.
On many modern drives, the seek times are around 5 to 10 milliseconds.
The settle time varies as well.  the longer the seek, the longer the
settle, generally.  This is the time it takes for the head to stop shaking
and rest quietly over a particular track.
Rotational Latency is the amount of time you have to wait, on average, for
the sector you want to come under the heads.
Assuming an 8 ms seek, and 2 ms settle (typical numbers), and that the
rotational latency on average is 1/2 of a rotation:  At 10k rpm, a
rotation takes 1/166.667 of a second, or 6 mS.  So, a half a rotation is
approximately 3 mS.  By going to a 15k rpm drive, the latency drops to 2
mS.  So, if we add them up, on the same basic drive, one being 10k and one
being 15k, we get:
10krpm: 8+2+3 = 13 mS
15krpm: 8+2+2 = 12 mS
So, based on the decrease in rotational latency being the only advantage
the 15krpm drive has over the 10krpm drive, we get an decrease in access
time of only 1 mS, or only about an 8% decrease in actual seek time.
So, if you're random page cost on 10krpm drives was 1.7, you'd need to
drop it to 1.57 or so to reflect the speed increase from 15krpm drives.
I.e. it's much more likely that going from 1 gig to 2 gigs of ram will
make a noticeable difference than going from 10k to 15k drives.




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-12 Thread Marty Scholes
Six days ago I installed Pg 7.4.1 on Sparc Solaris 8 also.  I am hopeful 
that we as well can migrate a bunch of our apps from Oracle.

After doing some informal benchmarks and performance testing for the 
past week I am becoming more and more impressed with what I see.

I have seen similar results to what you are describing.

I found that running a full vacuum:

vacuumdb -fza

followed by a checkpoint makes it run fast again.

Try timing the update with and without a full vacuum.

I can't help but wonder if a clean shutdown includes some vacuuming.

Obviously, in a production database this would be an issue.

Please post back what you learn.

Sincerely,
Marty
I have been doing a bunch of informat

Joe Conway wrote:
I'm trying to troubleshoot a performance issue on an application ported 
from Oracle to postgres. Now, I know the best way to get help is to post 
the schema, explain analyze output, etc, etc -- unfortunately I can't do 
that at the moment. However, maybe someone can point me in the right 
direction to figure this out on my own. That said, here are a few 
details...

PostgreSQL 7.4.1
bash-2.03$ uname -a
SunOS col65 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-280R
The problem is this: the application runs an insert, that fires off a 
trigger, that cascades into a fairly complex series of functions, that 
do a bunch of calculations, inserts, updates, and deletes. Immediately 
after a postmaster restart, the first insert or two take about 1.5 
minutes (undoubtedly this could be improved, but it isn't the main 
issue). However by the second or third insert, the time increases to 7 - 
9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
the first one or two inserts are back to the 1.5 minute range.

Any ideas spring to mind? I don't have much experience with Postgres on 
Solaris -- could it be related to that somehow?

Thanks for any insights.

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])