Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-17 Thread Greg Smith

Andres Freund wrote:

An fsync() equals a barrier so it has the effect of stopping
reordering around it - especially on systems with larger multi-disk
arrays thats pretty expensive.
You can achieve surprising speedups, at least in my experience, by
forcing the kernel to start writing out pages *without enforcing
barriers* first and then later enforce a barrier to be sure its
actually written out.


Standard practice on high performance systems with good filesystems and 
a battery-backed controller is to turn off barriers anyway.  That's one 
of the first things to tune on XFS for example, when you have a reliable 
controller.  I don't have enough data on ext4 to comment on tuning for 
it yet.


The sole purpose for the whole Linux write barrier implementation in my 
world is to flush the drive's cache, when the database does writes onto 
cheap SATA drives that will otherwise cache dangerously.  Barriers don't 
have any place on a serious system that I can see.  The battery-backed 
RAID controller you have to use to make fsync calls fast anyway can do 
some simple write reordering, but the operating system doesn't ever have 
enough visibility into what it's doing to make intelligent decisions 
about that anyway. 


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-17 Thread Greg Smith

Bruce Momjian wrote:

Scott Carey wrote:
  

Don't ever have WAL and data on the same OS volume as ext3.

...
One partition for WAL, one for data.  If using ext3 this is essentially
a performance requirement no matter how your array is set up underneath.



Do we need to document this?
  


Not for 9.0.  What Scott is suggesting is often the case, but not 
always; I can produce a counter example at will now that I know exactly 
which closets have the skeletons in them.  The underlying situation is 
more complicated due to some limitations to the whole spread 
checkpoint code that is turning really sour on newer hardware with 
large amounts of RAM.  I have about 5 pages of written notes on this 
specific issue so far, and that keeps growing every week.  That's all 
leading toward a proposed 9.1 change to the specific fsync behavior.  
And I expect to dump a large stack of documentation to support that 
patch that will address this whole area.  I'll put the whole thing onto 
the wiki as soon as my 9.0 related work settles down.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-17 Thread Andres Freund
On Tuesday 17 August 2010 10:29:10 Greg Smith wrote:
 Andres Freund wrote:
  An fsync() equals a barrier so it has the effect of stopping
  reordering around it - especially on systems with larger multi-disk
  arrays thats pretty expensive.
  You can achieve surprising speedups, at least in my experience, by
  forcing the kernel to start writing out pages *without enforcing
  barriers* first and then later enforce a barrier to be sure its
  actually written out.
 
 Standard practice on high performance systems with good filesystems and
 a battery-backed controller is to turn off barriers anyway.  That's one
 of the first things to tune on XFS for example, when you have a reliable
 controller.  I don't have enough data on ext4 to comment on tuning for
 it yet.
 
 The sole purpose for the whole Linux write barrier implementation in my
 world is to flush the drive's cache, when the database does writes onto
 cheap SATA drives that will otherwise cache dangerously.  Barriers don't
 have any place on a serious system that I can see.  The battery-backed
 RAID controller you have to use to make fsync calls fast anyway can do
 some simple write reordering, but the operating system doesn't ever have
 enough visibility into what it's doing to make intelligent decisions
 about that anyway.
Even if were not talking about a write barrier in an ensure its written out 
of the cache way it still stops the io-scheduler from reordering. I 
benchmarked it (custom app) and it was very noticeable on a bunch of different 
systems (with a good BBUed RAID).

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Greg Smith

Scott Carey wrote:

This is because an fsync on ext3 flushes _all dirty pages in the file system_ 
to disk, not just those for the file being fsync'd.
One partition for WAL, one for data.  If using ext3 this is essentially a performance requirement no matter how your array is set up underneath. 
  


Unless you want the opposite of course.  Some systems split out the WAL 
onto a second disk, only to discover checkpoint I/O spikes become a 
problem all of the sudden after that.  The fsync calls for the WAL 
writes keep the write cache for the data writes from ever getting too 
big.  This slows things down on average, but makes the worst case less 
stressful.  Free lunches are so hard to find nowadays...


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Andres Freund
On Mon, Aug 16, 2010 at 01:46:21PM -0400, Greg Smith wrote:
 Scott Carey wrote:
 This is because an fsync on ext3 flushes _all dirty pages in the file 
 system_ to disk, not just those for the file being fsync'd.
 One partition for WAL, one for data.  If using ext3 this is
 essentially a performance requirement no matter how your array is
 set up underneath.

 Unless you want the opposite of course.  Some systems split out the
 WAL onto a second disk, only to discover checkpoint I/O spikes
 become a problem all of the sudden after that.  The fsync calls for
 the WAL writes keep the write cache for the data writes from ever
 getting too big.  This slows things down on average, but makes the
 worst case less stressful.  Free lunches are so hard to find
 nowadays...
Or use -o sync. Or configure a ridiciuosly low dirty_memory amount
(which has a problem on large systems because 1% can still be too
much. Argh.)...

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Greg Smith

Andres Freund wrote:

Or use -o sync. Or configure a ridiciuosly low dirty_memory amount
(which has a problem on large systems because 1% can still be too
much. Argh.)...
  


-o sync completely trashes performance, and trying to set the 
dirty_ratio values to even 1% doesn't really work due to things like the 
congestion avoidance code in the kernel.  If you sync a lot more 
often, which putting the WAL on the same disk as the database 
accidentally does for you, that works surprisingly well at avoiding this 
whole class of problem on ext3.  A really good solution is going to take 
a full rewrite of the PostgreSQL checkpoint logic though, which will get 
sorted out during 9.1 development.  (cue dramatic foreshadowing music here)


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Andres Freund
On Mon, Aug 16, 2010 at 04:13:22PM -0400, Greg Smith wrote:
 Andres Freund wrote:
 Or use -o sync. Or configure a ridiciuosly low dirty_memory amount
 (which has a problem on large systems because 1% can still be too
 much. Argh.)...

 -o sync completely trashes performance, and trying to set the
 dirty_ratio values to even 1% doesn't really work due to things like
 the congestion avoidance code in the kernel.  If you sync a lot
 more often, which putting the WAL on the same disk as the database
 accidentally does for you, that works surprisingly well at avoiding
 this whole class of problem on ext3.  A really good solution is
 going to take a full rewrite of the PostgreSQL checkpoint logic
 though, which will get sorted out during 9.1 development.  (cue
 dramatic foreshadowing music here)
-o sync works ok enough for the data partition (surely not the wal) if you make 
the
background writer less aggressive.

But yes. A new checkpointing logic + a new syncing logic
(prepare_fsync() earlier and then fsync() later) would be a nice
thing. Do you plan to work on that?

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Greg Smith

Andres Freund wrote:

A new checkpointing logic + a new syncing logic
(prepare_fsync() earlier and then fsync() later) would be a nice
thing. Do you plan to work on that?
  


The background writer already caches fsync calls into a queue, so the 
prepare step you're thinking needs to be there is already.  The problem 
is that the actual fsync calls happen in a tight loop.  That we're busy 
fixing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Andres Freund
On Mon, Aug 16, 2010 at 04:54:19PM -0400, Greg Smith wrote:
 Andres Freund wrote:
 A new checkpointing logic + a new syncing logic
 (prepare_fsync() earlier and then fsync() later) would be a nice
 thing. Do you plan to work on that?
 The background writer already caches fsync calls into a queue, so
 the prepare step you're thinking needs to be there is already.  The
 problem is that the actual fsync calls happen in a tight loop.  That
 we're busy fixing.
That doesn't help that much on many systems with a somewhat deep
queue. An fsync() equals a barrier so it has the effect of stopping
reordering around it - especially on systems with larger multi-disk
arrays thats pretty expensive.
You can achieve surprising speedups, at least in my experience, by
forcing the kernel to start writing out pages *without enforcing
barriers* first and then later enforce a barrier to be sure its
actually written out. Which, in a simplified case, turns the earlier
needed multiple barriers into a single one (in practise you want to
call fsync() anyway, but thats not a big problem if its already
written out).

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-16 Thread Bruce Momjian
Scott Carey wrote:
 Don't ever have WAL and data on the same OS volume as ext3.
 
 If data=writeback, performance will be fine, data integrity will be ok
 for WAL, but data integrity will not be sufficient for the data
 partition.  If data=ordered, performance will be very bad, but data
 integrity will be OK.
 
 This is because an fsync on ext3 flushes _all dirty pages in the file
 system_ to disk, not just those for the file being fsync'd.
 
 One partition for WAL, one for data.  If using ext3 this is essentially
 a performance requirement no matter how your array is set up underneath.

Do we need to document this?

--
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-13 Thread Bruce Momjian
Greg Smith wrote:
  2) Should I configure the ext3 file system with noatime and/or 
  data=writeback or data=ordered?  My controller has a battery, the 
  logical drive has write cache enabled (write-back), and the physical 
  devices have write cache disabled (write-through).
 
 data=ordered is the ext3 default and usually a reasonable choice.  Using 
 writeback instead can be dangerous, I wouldn't advise starting there.  
 noatime is certainly a good thing, but the speedup is pretty minor if 
 you have a battery-backed write cache.

We recomment 'data=writeback' for ext3 in our docs:

http://www.postgresql.org/docs/9.0/static/wal-intro.html

Tip:  Because WAL restores database file contents after a crash,
journaled file systems are not necessary for reliable storage of the
data files or WAL files. In fact, journaling overhead can reduce
performance, especially if journaling causes file system data  to be
flushed to disk. Fortunately, data flushing during journaling can often
be disabled with a file system mount option, e.g. data=writeback on a
Linux ext3 file system. Journaled file systems do improve boot speed
after a crash. 

Should this be changed?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-13 Thread Greg Smith

Bruce Momjian wrote:

We recomment 'data=writeback' for ext3 in our docs
  


Only for the WAL though, which is fine, and I think spelled out clearly 
enough in the doc section you quoted.  Ken's system has one big RAID 
volume, which means he'd be mounting the data files with 'writeback' 
too; that's the thing to avoid.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-12 Thread Justin Pitts
 As others said, RAID6 is RAID5 + a hot spare.

 No. RAID6 is NOT RAID5 plus a hot spare.

 The original phrase was that RAID 6 was like RAID 5 with a hot spare
 ALREADY BUILT IN.

Built-in, or not - it is neither. It is more than that, actually. RAID
6 is like RAID 5 in that it uses parity for redundancy and pays a
write cost for maintaining those parity blocks, but will maintain data
integrity in the face of 2 simultaneous drive failures.

In terms of storage cost, it IS like paying for RAID5 + a hot spare,
but the protection is better.

A RAID 5 with a hot spare built in could not survive 2 simultaneous
drive failures.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-08 Thread Scott Marlowe
On Sun, Aug 8, 2010 at 12:46 AM, Scott Carey sc...@richrelevance.com wrote:

 On Aug 5, 2010, at 4:09 PM, Scott Marlowe wrote:

 On Thu, Aug 5, 2010 at 4:27 PM, Pierre C li...@peufeu.com wrote:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little on
 RAID 6.

 As others said, RAID6 is RAID5 + a hot spare.

 Basically when you UPDATE a row, at some point postgres will write the page
 which contains that row.

 RAID10 : write the page to all mirrors.
 RAID5/6 : write the page to the relevant disk. Read the corresponding page
 from all disks (minus one), compute parity, write parity.

 Actually it's not quite that bad.  You only have to read from two
 disks, the data disk and the parity disk, then compute new parity and
 write to both disks.  Still 2 reads / 2 writes for every write.

 As you can see one small write will need to hog all drives in the array.
 RAID5/6 performance for small random writes is really, really bad.

 Databases like RAID10 for reads too because when you need some random data
 you can get it from any of the mirrors, so you get increased parallelism on
 reads too.

 Also for sequential access RAID-10 can read both drives in a pair
 interleaved so you get 50% of the data you need from each drive and
 double the read rate there.  This is even true for linux software md
 RAID.


 My experience is that it is ONLY true for software RAID and ZFS.  Most 
 hardware raid controllers read both mirrors and validate that the data is 
 equal, and thus writing is about as fast as read.  Tested with Adaptec, 
 3Ware, Dell PERC 4/5/6, and LSI MegaRaid hardware wise.  In all cases it was 
 clear that the hardware raid was not using data from the two mirrors to 
 improve read performance for sequential or random I/O.

Interesting.  I'm using an Areca, I'll have to run some tests and see
if a mirror is reading at  100% read speed of a single drive or not.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-07 Thread Justin Pitts
 Yes, I know that.  I am very familiar with how RAID6 works.  RAID5
 with the hot spare already rebuilt / built in is a good enough answer
 for management where big words like parity might scare some PHBs.

 In terms of storage cost, it IS like paying for RAID5 + a hot spare,
 but the protection is better.

 A RAID 5 with a hot spare built in could not survive 2 simultaneous
 drive failures.

 Exactly.  Which is why I had said with the hot spare already built in
 / rebuilt.

My apologies. The 'rebuilt' slant escaped me. Thats a fair way to cast it.

 Geeze, pedant much?

Of course!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-06 Thread Matthew Wakeling

On Thu, 5 Aug 2010, Scott Marlowe wrote:

RAID6 is basically RAID5 with a hot spare already built into the
array.


On Fri, 6 Aug 2010, Pierre C wrote:

As others said, RAID6 is RAID5 + a hot spare.


No. RAID6 is NOT RAID5 plus a hot spare.

RAID5 uses a single parity datum (XOR) to ensure protection against data 
loss if one drive fails.


RAID6 uses two different sets of parity (Reed-Solomon) to ensure 
protection against data loss if two drives fail simultaneously.


If you have a RAID5 set with a hot spare, and you lose two drives, then 
you have data loss. If the same happens to a RAID6 set, then there is no 
data loss.


Matthew

--
And the lexer will say Oh look, there's a null string. Oooh, there's 
another. And another., and will fall over spectacularly when it realises

there are actually rather a lot.
- Computer Science Lecturer (edited)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-06 Thread Scott Marlowe
On Fri, Aug 6, 2010 at 3:17 AM, Matthew Wakeling matt...@flymine.org wrote:
 On Thu, 5 Aug 2010, Scott Marlowe wrote:

 RAID6 is basically RAID5 with a hot spare already built into the
 array.

 On Fri, 6 Aug 2010, Pierre C wrote:

 As others said, RAID6 is RAID5 + a hot spare.

 No. RAID6 is NOT RAID5 plus a hot spare.

The original phrase was that RAID 6 was like RAID 5 with a hot spare
ALREADY BUILT IN.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-06 Thread Scott Marlowe
On Fri, Aug 6, 2010 at 11:32 AM, Justin Pitts justinpi...@gmail.com wrote:
 As others said, RAID6 is RAID5 + a hot spare.

 No. RAID6 is NOT RAID5 plus a hot spare.

 The original phrase was that RAID 6 was like RAID 5 with a hot spare
 ALREADY BUILT IN.

 Built-in, or not - it is neither. It is more than that, actually. RAID
 6 is like RAID 5 in that it uses parity for redundancy and pays a
 write cost for maintaining those parity blocks, but will maintain data
 integrity in the face of 2 simultaneous drive failures.

Yes, I know that.  I am very familiar with how RAID6 works.  RAID5
with the hot spare already rebuilt / built in is a good enough answer
for management where big words like parity might scare some PHBs.

 In terms of storage cost, it IS like paying for RAID5 + a hot spare,
 but the protection is better.

 A RAID 5 with a hot spare built in could not survive 2 simultaneous
 drive failures.

Exactly.  Which is why I had said with the hot spare already built in
/ rebuilt.  Geeze, pedant much?


-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Kenneth Cox
I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running  
CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 SATA 7500RPM  
disks in RAID 6, and for the OLAP workload it feels* slow.  I have 6 more  
disks to add, and the RAID has to be rebuilt in any case, but first I  
would like to solicit general advice.  I know that's little data to go on,  
and I believe in the scientific method, but in this case I don't have the  
time to make many iterations.


My questions are simple, but in my reading I have not been able to find  
definitive answers:


1) Should I switch to RAID 10 for performance?  I see things like RAID 5  
is bad for a DB and RAID 5 is slow with = 6 drives but I see little on  
RAID 6.  RAID 6 was the original choice for more usable space with good  
redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using  
dd of 2xRAM per  
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).


2) Should I configure the ext3 file system with noatime and/or  
data=writeback or data=ordered?  My controller has a battery, the logical  
drive has write cache enabled (write-back), and the physical devices have  
write cache disabled (write-through).


3) Do I just need to spend more time configuring postgresql?  My  
non-default settings were largely generated by pgtune-0.9.3:


max_locks_per_transaction = 128 # manual; avoiding out of shared  
memory

default_statistics_target = 100
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 352MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2316MB
max_connections = 32

I am happy to take informed opinion.  If you don't have the time to  
properly cite all your sources but have suggestions, please send them.


Thanks in advance,
Ken

* I know feels slow is not scientific.  What I mean is that any single  
query on a fact table, or any 'rm -rf' of a big directory sends disk  
utilization to 100% (measured with iostat -x 3).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Alan Hodgson
On Thursday, August 05, 2010, Kenneth Cox kens...@gmail.com wrote:
 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little
 on RAID 6.  RAID 6 was the original choice for more usable space with
 good redundancy.  My current performance is 85MB/s write, 151 MB/s reads
 (using dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).

If you can spare the drive space, go to RAID 10. RAID 5/6 usually look fine 
on single-threaded sequential tests (unless your controller really sucks), 
but in the real world with multiple processes doing random I/O RAID 10 will 
go a lot further on the same drives. Plus your recovery time from disk 
failures will be a lot faster.

If you can't spare the drive space ... you should buy more drives. 

 
 2) Should I configure the ext3 file system with noatime and/or
 data=writeback or data=ordered?  My controller has a battery, the logical
 drive has write cache enabled (write-back), and the physical devices have
 write cache disabled (write-through).

noatime is fine but really minor filesystem options rarely show much impact. 
My best performance comes from XFS filesystems created with stripe options 
matching the underlying RAID array. Anything else is just a bonus.

 * I know feels slow is not scientific.  What I mean is that any single
 query on a fact table, or any 'rm -rf' of a big directory sends disk
 utilization to 100% (measured with iostat -x 3).

.. and it should. Any modern system can peg a small disk array without much 
effort. Disks are slow.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 12:28 PM, Kenneth Cox kens...@gmail.com wrote:
 I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running
 CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 SATA 7500RPM
 disks in RAID 6, and for the OLAP workload it feels* slow.  I have 6 more
 disks to add, and the RAID has to be rebuilt in any case, but first I would
 like to solicit general advice.  I know that's little data to go on, and I
 believe in the scientific method, but in this case I don't have the time to
 make many iterations.

 My questions are simple, but in my reading I have not been able to find
 definitive answers:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5 is
 bad for a DB and RAID 5 is slow with = 6 drives but I see little on RAID
 6.  RAID 6 was the original choice for more usable space with good
 redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using
 dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).

Sequential read / write is not very useful for a database benchmark.
It does kind of give you a baseline for throughput, but most db access
is mixed enough that random access becomes the important measurement.

RAID6 is basically RAID5 with a hot spare already built into the
array.  This makes rebuild less of an issue since you can reduce the
spare io used to rebuild the array to something really small.
However, it's in the same performance ballpark as RAID 5 with the
accompanying write performance penalty.

RAID-10 is pretty much the only way to go for a DB, and if you need
more space, you need more or bigger drives, not RAID-5/6

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Greg Smith

Kenneth Cox wrote:
1) Should I switch to RAID 10 for performance?  I see things like 
RAID 5 is bad for a DB and RAID 5 is slow with = 6 drives but I 
see little on RAID 6.  RAID 6 was the original choice for more usable 
space with good redundancy.  My current performance is 85MB/s write, 
151 MB/s reads 


RAID6 is no better than RAID5 performance wise, it just has better fault 
tolerance.  And the ServeRAID 8k is a pretty underpowered card as RAID 
controllers go, so it would not be impossible for it computing RAID 
parity and the like to be the bottleneck here.  I'd expect a 6-disk 
RAID10 with 7200RPM drives to be closer to 120MB/s on writes, so you're 
not getting ideal performance there.  Your read figure is more 
competative, but that's usually the RAID5 pattern--decent on reads, 
slugging on writes.


2) Should I configure the ext3 file system with noatime and/or 
data=writeback or data=ordered?  My controller has a battery, the 
logical drive has write cache enabled (write-back), and the physical 
devices have write cache disabled (write-through).


data=ordered is the ext3 default and usually a reasonable choice.  Using 
writeback instead can be dangerous, I wouldn't advise starting there.  
noatime is certainly a good thing, but the speedup is pretty minor if 
you have a battery-backed write cache.



3) Do I just need to spend more time configuring postgresql?  My 
non-default settings were largely generated by pgtune-0.9.3


Those look reasonable enough, except no reason to make wal_buffers 
bigger than 16MB.  That work_mem figure might be high too, that's a 
known concern with pgtune I need to knock out of it one day soon.  When 
you are hitting high I/O wait periods, is the system running out of RAM 
and swapping?  That can cause really nasty I/O wait.


Your basic hardware is off a bit, but not so badly that I'd start 
there.  Have you turned on slow query logging to see what is hammering 
the system when the iowait climbs?  Often tuning those by looking at the 
EXPLAIN ANALYZE output can be much more effective than hardware/server 
configuration tuning.


* I know feels slow is not scientific.  What I mean is that any 
single query on a fact table, or any 'rm -rf' of a big directory sends 
disk utilization to 100% (measured with iostat -x 3).


rm -rf is really slow on ext3 on just about any hardware.  If your 
fact tables aren't in RAM and you run a query against them, paging them 
back in again will hammer the disks until it's done.  That's not 
necessarily indicative of a misconfiguration on its own.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Craig James

On 8/5/10 11:28 AM, Kenneth Cox wrote:

I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM
running CentOS 5.4 x86_64. I have a ServeRAID 8k controller with 6 SATA
7500RPM disks in RAID 6, and for the OLAP workload it feels* slow
 My current performance is 85MB/s write, 151 MB/s reads


I get 193MB/sec write and 450MB/sec read on a RAID10 on 8 SATA 7200 RPM disks.  
RAID10 seems to scale linearly -- add disks, get more speed, to the limit of 
your controller.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 4:27 PM, Pierre C li...@peufeu.com wrote:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little on
 RAID 6.

 As others said, RAID6 is RAID5 + a hot spare.

 Basically when you UPDATE a row, at some point postgres will write the page
 which contains that row.

 RAID10 : write the page to all mirrors.
 RAID5/6 : write the page to the relevant disk. Read the corresponding page
 from all disks (minus one), compute parity, write parity.

Actually it's not quite that bad.  You only have to read from two
disks, the data disk and the parity disk, then compute new parity and
write to both disks.  Still 2 reads / 2 writes for every write.

 As you can see one small write will need to hog all drives in the array.
 RAID5/6 performance for small random writes is really, really bad.

 Databases like RAID10 for reads too because when you need some random data
 you can get it from any of the mirrors, so you get increased parallelism on
 reads too.

Also for sequential access RAID-10 can read both drives in a pair
interleaved so you get 50% of the data you need from each drive and
double the read rate there.  This is even true for linux software md
RAID.

 with good redundancy.  My current performance is 85MB/s write, 151 MB/s
 reads

 FYI, I get 200 MB/s sequential out of the software RAID5 of 3 cheap desktop
 consumer SATA drives in my home multimedia server...

On a machine NOT configured for max seq throughput (it's used for
mostly OLTP stuff) I get 325M/s both read and write speed with a 26
disk RAID-10.  OTOH, that setup gets ~6000 to 7000 transactions per
second with multi-day runs of pgbench.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Dave Crooke
Definitely switch to RAID-10  it's not merely that it's a fair bit
faster on normal operations (less seek contention), it's **WAY** faster than
any parity based RAID (RAID-2 through RAID-6) in degraded mode when you lose
a disk and have to rebuild it. This is something many people don't test for,
and then get bitten badly when they lose a drive under production loads.

Use higher capacity drives if necessary to make your data fit in the number
of spindles your controller supports ... the difference in cost is modest
compared to an overall setup, especially with SATA. Make sure you still
leave at least one hot spare!

In normal operation, RAID-5 has to read and write 2 drives for every write
... not sure about RAID-6 but I suspect it needs to read the entire stripe
to recalculate the Hamming parity, and it definitely has to write to 3
drives for each write, which means seeking all 3 of those drives to that
position. In degraded mode (a disk rebuilding) with either of those levels,
ALL the drives have to seek to that point for every write, and for any reads
of the failed drive, so seek contention is horrendous.

RAID-5 and RAID-6 are designed for optimum capacity, protection, and low
write performance, which is fine for a general file server.

Parity RAID simply isn't suitable for database use  anyone who claims
otherwise either (a) doesn't understand the failure modes of RAID, or (b) is
running in a situation where performance simply doesn't matter.

Cheers
Dave

On Thu, Aug 5, 2010 at 1:28 PM, Kenneth Cox kens...@gmail.com wrote:

 My questions are simple, but in my reading I have not been able to find
 definitive answers:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little on
 RAID 6.  RAID 6 was the original choice for more usable space with good
 redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using
 dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htmhttp://www.westnet.com/%7Egsmith/content/postgresql/pg-disktesting.htm
 ).



Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 5:13 PM, Dave Crooke dcro...@gmail.com wrote:
 Definitely switch to RAID-10  it's not merely that it's a fair bit
 faster on normal operations (less seek contention), it's **WAY** faster than
 any parity based RAID (RAID-2 through RAID-6) in degraded mode when you lose
 a disk and have to rebuild it. This is something many people don't test for,
 and then get bitten badly when they lose a drive under production loads.

Had a friend with a 600G x 5 disk RAID-5 and one drive died.  It took
nearly 48 hours to rebuild the array.

 Use higher capacity drives if necessary to make your data fit in the number
 of spindles your controller supports ... the difference in cost is modest
 compared to an overall setup, especially with SATA. Make sure you still
 leave at least one hot spare!

Yeah, a lot of chassis hold an even number of drives, and I wind up
with 2 hot spares because of it.

 Parity RAID simply isn't suitable for database use  anyone who claims
 otherwise either (a) doesn't understand the failure modes of RAID, or (b) is
 running in a situation where performance simply doesn't matter.

The only time it's acceptable is when you're running something like
low write volume report generation / batch processing, where you're
mostly sequentially reading and writing 100s of gigabytes at a time in
one or maybe two threads.

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 06:28, Kenneth Cox wrote:
I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM 
running CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 
SATA 7500RPM disks in RAID 6, and for the OLAP workload it feels* 
slow.  I have 6 more disks to add, and the RAID has to be rebuilt in 
any case, but first I would like to solicit general advice.  I know 
that's little data to go on, and I believe in the scientific method, 
but in this case I don't have the time to make many iterations.


My questions are simple, but in my reading I have not been able to 
find definitive answers:


1) Should I switch to RAID 10 for performance?  I see things like 
RAID 5 is bad for a DB and RAID 5 is slow with = 6 drives but I 
see little on RAID 6.  RAID 6 was the original choice for more usable 
space with good redundancy.  My current performance is 85MB/s write, 
151 MB/s reads (using dd of 2xRAM per 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).




Normally I'd agree with the others and recommend RAID10 - but you say 
you have an OLAP workload - if it is *heavily* read biased you may get 
better performance with RAID5 (more effective disks to read from). 
Having said that, your sequential read performance right now is pretty 
low (151 MB/s  - should be double this), which may point to an issue 
with this controller. Unfortunately this *may* be important for an OLAP 
workload (seq scans of big tables).




2) Should I configure the ext3 file system with noatime and/or 
data=writeback or data=ordered?  My controller has a battery, the 
logical drive has write cache enabled (write-back), and the physical 
devices have write cache disabled (write-through).




Probably wise to use noatime. If you have a heavy write workload (i.e so 
what I just wrote above does *not* apply), then you might find adjusting 
the ext3 commit interval upwards from its default of 5 seconds can help 
(I'm doing some testing at the moment and commit=20 seemed to improve 
performance by about 5-10%).


3) Do I just need to spend more time configuring postgresql?  My 
non-default settings were largely generated by pgtune-0.9.3:


max_locks_per_transaction = 128 # manual; avoiding out of shared 
memory

default_statistics_target = 100
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 352MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2316MB
max_connections = 32



Possibly higher checkpoint_segments and lower wal_buffers (I recall 
someone - maybe Greg suggesting that there was no benefit in having the 
latter  10MB). I wonder about setting shared_buffers higher - how large 
is the database?


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Alan Hodgson
On Thursday, August 05, 2010, Mark Kirkwood mark.kirkw...@catalyst.net.nz 
wrote:
 Normally I'd agree with the others and recommend RAID10 - but you say
 you have an OLAP workload - if it is *heavily* read biased you may get
 better performance with RAID5 (more effective disks to read from).
 Having said that, your sequential read performance right now is pretty
 low (151 MB/s  - should be double this), which may point to an issue
 with this controller. Unfortunately this *may* be important for an OLAP
 workload (seq scans of big tables).

Probably a low (default) readahead limitation. ext3 doesn't help but it can 
usually get up over 400MB/sec. Doubt it's the controller.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 11:58, Alan Hodgson wrote:

On Thursday, August 05, 2010, Mark Kirkwoodmark.kirkw...@catalyst.net.nz
wrote:
   

Normally I'd agree with the others and recommend RAID10 - but you say
you have an OLAP workload - if it is *heavily* read biased you may get
better performance with RAID5 (more effective disks to read from).
Having said that, your sequential read performance right now is pretty
low (151 MB/s  - should be double this), which may point to an issue
with this controller. Unfortunately this *may* be important for an OLAP
workload (seq scans of big tables).
 

Probably a low (default) readahead limitation. ext3 doesn't help but it can
usually get up over 400MB/sec. Doubt it's the controller.

   


Yeah - good suggestion, so cranking up readahead (man blockdev) and 
retesting is recommended.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance