Re: Postgres on RAID5

2005-03-16 Thread Michael Tokarev
David Dougall wrote:
In my experience, if you are concerned about filesystem performance, don't
use ext3.  It is one of the slowest filesystems I have ever used
especially for writes.  I would suggest either reiserfs or xfs.
I'm a bit afraid to start yet another filesystem flamewar, but.
Please don't make such a claims without providing actual numbers
and config details.  Pretty please.
ext3 performs well for databases, there's no reason for it to be
slow.  Ok, enable data=journal and use it with eg Oracle - you will
see it is slow.  But in that case it isn't the filesystem to blame,
it's operator error, simple as that.
And especially reiserfs, with its tail packing enabled by default,
is NOT suitable for databases...
/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Postgres on RAID5

2005-03-16 Thread David Dougall
In my experience, if you are concerned about filesystem performance, don't
use ext3.  It is one of the slowest filesystems I have ever used
especially for writes.  I would suggest either reiserfs or xfs.
--David Dougall


On Fri, 11 Mar 2005, Arshavir Grigorian wrote:

> Hi,
>
> I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
> an Ext3 filesystem which is used by Postgres. Currently we are loading a
> 50G database on this server from a Postgres dump (copy, not insert) and
> are experiencing very slow write performance (35 records per second).
>
> Top shows that the Postgres process (postmaster) is being constantly put
> into D state for extended periods of time (2-3 seconds) which I assume
> is because it's waiting for disk io. I have just started gathering
> system statistics and here is what sar -b shows: (this is while the db
> is being loaded - pg_restore)
>
>  tpsrtps wtps  bread/s  bwrtn/s
> 01:35:01 PM275.77 76.12199.66709.59   2315.23
> 01:45:01 PM287.25 75.56211.69706.52   2413.06
> 01:55:01 PM281.73 76.35205.37711.84   2389.86
> 02:05:01 PM282.83 76.14206.69720.85   2418.51
> 02:15:01 PM284.07 76.15207.92707.38   2443.60
> 02:25:01 PM265.46 75.91189.55708.87   2089.21
> 02:35:01 PM285.21 76.02209.19709.58   2446.46
> Average:   280.33 76.04204.30710.66   2359.47
>
> This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM.
> It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom
> compiled kernel. Postgres is installed from the Debian package and uses
> all the configuration defaults.
>
> I am also copying the pgsql-performance list.
>
> Thanks in advance for any advice/pointers.
>
>
> Arshavir
>
> Following is some other info that might be helpful.
>
> /proc/scsi# mdadm -D /dev/md1
> /dev/md1:
>  Version : 00.90.00
>Creation Time : Wed Feb 23 17:23:41 2005
>   Raid Level : raid5
>   Array Size : 123823616 (118.09 GiB 126.80 GB)
>  Device Size : 8844544 (8.43 GiB 9.06 GB)
> Raid Devices : 15
>Total Devices : 17
> Preferred Minor : 1
>  Persistence : Superblock is persistent
>
>  Update Time : Thu Feb 24 10:05:38 2005
>State : active
>   Active Devices : 15
> Working Devices : 16
>   Failed Devices : 1
>Spare Devices : 1
>
>   Layout : left-symmetric
>   Chunk Size : 64K
>
> UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df
>   Events : 0.8
>
>  Number   Major   Minor   RaidDevice State
> 0   8   640  active sync   /dev/sde
> 1   8   801  active sync   /dev/sdf
> 2   8   962  active sync   /dev/sdg
> 3   8  1123  active sync   /dev/sdh
> 4   8  1284  active sync   /dev/sdi
> 5   8  1445  active sync   /dev/sdj
> 6   8  1606  active sync   /dev/sdk
> 7   8  1767  active sync   /dev/sdl
> 8   8  1928  active sync   /dev/sdm
> 9   8  2089  active sync   /dev/sdn
>10   8  224   10  active sync   /dev/sdo
>11   8  240   11  active sync   /dev/sdp
>12  650   12  active sync   /dev/sdq
>13  65   16   13  active sync   /dev/sdr
>14  65   32   14  active sync   /dev/sds
>
>15  65   48   15  spare   /dev/sdt
>
> # dumpe2fs -h /dev/md1
> dumpe2fs 1.35 (28-Feb-2004)
> Filesystem volume name:   
> Last mounted on:  
> Filesystem UUID:  1bb95bd6-94c7-4344-adf2-8414cadae6fc
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal dir_index needs_recovery large_file
> Default mount options:(none)
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  15482880
> Block count:  30955904
> Reserved block count: 1547795
> Free blocks:  28767226
> Free inodes:  15482502
> First block:  0
> Block size:   4096
> Fragment size:4096
> Blocks per group: 32768
> Fragments per group:  32768
> Inodes per group: 16384
> Inode blocks per group:   512
> Filesystem created:   Wed Feb 23 17:27:13 2005
> Last mount time:  Wed Feb 23 17:45:25 2005
> Last write time:  Wed Feb 23 17:45:25 2005
> Mount count:  2
> Maximum mount count:  28
> Last checked: Wed Feb 23 17:27:13 2005
> Check interval:   15552000 (6 months)
> Next check after: Mon Aug 22 18:27:13 2005
> Reserved blocks uid:  0 (user root)
> Reserved blo

RE: [PERFORM] Postgres on RAID5

2005-03-14 Thread Guy
You said:
"If your write size is smaller than chunk_size*N (N = number of data blocks
in a stripe), in order to calculate correct parity you have to read data
from the remaining drives."

Neil explained it in this message:
http://marc.theaimsgroup.com/?l=linux-raid&m=108682190730593&w=2

Guy

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Tokarev
Sent: Monday, March 14, 2005 5:47 PM
To: Arshavir Grigorian
Cc: linux-raid@vger.kernel.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgres on RAID5

Arshavir Grigorian wrote:
> Alex Turner wrote:
> 
[]
> Well, by putting the pg_xlog directory on a separate disk/partition, I 
> was able to increase this rate to about 50 or so per second (still 
> pretty far from your numbers). Next I am going to try putting the 
> pg_xlog on a RAID1+0 array and see if that helps.

pg_xlog is written syncronously, right?  It should be, or else reliability
of the database will be at a big question...

I posted a question on Feb-22 here in linux-raid, titled "*terrible*
direct-write performance with raid5".  There's a problem with write
performance of a raid4/5/6 array, which is due to the design.

Consider raid5 array (raid4 will be exactly the same, and for raid6,
just double the parity writes) with N data block and 1 parity block.
At the time of writing a portion of data, parity block should be
updated too, to be consistent and recoverable.  And here, the size of
the write plays very significant role.  If your write size is smaller
than chunk_size*N (N = number of data blocks in a stripe), in order
to calculate correct parity you have to read data from the remaining
drives.  The only case where you don't need to read data from other
drives is when you're writing by the size of chunk_size*N, AND the
write is block-aligned.  By default, chunk_size is 64Kb (min is 4Kb).
So the only reasonable direct-write size of N drives will be 64Kb*N,
or else raid code will have to read "missing" data to calculate the
parity block.  Ofcourse, in 99% cases you're writing in much smaller
sizes, say 4Kb or so.  And here, the more drives you have, the
LESS write speed you will have.

When using the O/S buffer and filesystem cache, the system has much
more chances to re-order requests and sometimes even omit reading
entirely (when you perform many sequentional writes for example,
without sync in between), so buffered writes might be much fast.
But not direct or syncronous writes, again especially when you're
doing alot of sequential writes...

So to me it looks like an inherent problem of raid5 architecture
wrt database-like workload -- databases tends to use syncronous
or direct writes to ensure good data consistency.

For pgsql, which (i don't know for sure but reportedly) uses syncronous
writs only for the transaction log, it is a good idea to put that log
only to a raid1 or raid10 array, but NOT to raid5 array.

Just IMHO ofcourse.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Michael Tokarev
Arshavir Grigorian wrote:
Alex Turner wrote:
[]
Well, by putting the pg_xlog directory on a separate disk/partition, I 
was able to increase this rate to about 50 or so per second (still 
pretty far from your numbers). Next I am going to try putting the 
pg_xlog on a RAID1+0 array and see if that helps.
pg_xlog is written syncronously, right?  It should be, or else reliability
of the database will be at a big question...
I posted a question on Feb-22 here in linux-raid, titled "*terrible*
direct-write performance with raid5".  There's a problem with write
performance of a raid4/5/6 array, which is due to the design.
Consider raid5 array (raid4 will be exactly the same, and for raid6,
just double the parity writes) with N data block and 1 parity block.
At the time of writing a portion of data, parity block should be
updated too, to be consistent and recoverable.  And here, the size of
the write plays very significant role.  If your write size is smaller
than chunk_size*N (N = number of data blocks in a stripe), in order
to calculate correct parity you have to read data from the remaining
drives.  The only case where you don't need to read data from other
drives is when you're writing by the size of chunk_size*N, AND the
write is block-aligned.  By default, chunk_size is 64Kb (min is 4Kb).
So the only reasonable direct-write size of N drives will be 64Kb*N,
or else raid code will have to read "missing" data to calculate the
parity block.  Ofcourse, in 99% cases you're writing in much smaller
sizes, say 4Kb or so.  And here, the more drives you have, the
LESS write speed you will have.
When using the O/S buffer and filesystem cache, the system has much
more chances to re-order requests and sometimes even omit reading
entirely (when you perform many sequentional writes for example,
without sync in between), so buffered writes might be much fast.
But not direct or syncronous writes, again especially when you're
doing alot of sequential writes...
So to me it looks like an inherent problem of raid5 architecture
wrt database-like workload -- databases tends to use syncronous
or direct writes to ensure good data consistency.
For pgsql, which (i don't know for sure but reportedly) uses syncronous
writs only for the transaction log, it is a good idea to put that log
only to a raid1 or raid10 array, but NOT to raid5 array.
Just IMHO ofcourse.
/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Jim Buttafuoco
All,

I have a 13 disk (250G each) software raid 5 set using 1 16 port adaptec SATA 
controller.   
I am very happy with the performance. The reason I went with the 13 disk raid 5 
set was for the space NOT performance. 
  I have a single postgresql database that is over 2 TB with about 500 GB free 
on the disk.   This raid set performs
about the same as my ICP SCSI raid controller (also with raid 5).  

That said, now that postgresql 8 has tablespaces, I would NOT create 1 single 
raid 5 set, but 3 smaller sets.  I also DO
NOT have my wal and log's on this raid set, but on a  smaller 2 disk mirror.

Jim

-- Original Message ---
From: Greg Stark <[EMAIL PROTECTED]>
To: Alex Turner <[EMAIL PROTECTED]>
Cc: Greg Stark <[EMAIL PROTECTED]>, Arshavir Grigorian <[EMAIL PROTECTED]>, 
linux-raid@vger.kernel.org,
pgsql-performance@postgresql.org
Sent: 14 Mar 2005 15:17:11 -0500
Subject: Re: [PERFORM] Postgres on RAID5

> Alex Turner <[EMAIL PROTECTED]> writes:
> 
> > a 14 drive stripe will max out the PCI bus long before anything else,
> 
> Hopefully anyone with a 14 drive stripe is using some combination of 64 bit
> PCI-X cards running at 66Mhz...
> 
> > the only reason for a stripe this size is to get a total accessible
> > size up.
> 
> Well, many drives also cuts average latency. So even if you have no need for
> more bandwidth you still benefit from a lower average response time by adding
> more drives.
> 
> -- 
> greg
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
--- End of Original Message ---

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Arshavir Grigorian
Alex Turner wrote:
a 14 drive stripe will max out the PCI bus long before anything else,
the only reason for a stripe this size is to get a total accessible
size up.  A 6 drive RAID 10 on a good controller can get up to
400Mb/sec which is pushing the limit of the PCI bus (taken from
offical 3ware 9500S 8MI benchmarks).  140 drives is not going to beat
6 drives because you've run out of bandwidth on the PCI bus.
The debait on RAID 5 rages onward.  The benchmarks I've seen suggest
that RAID 5 is consistantly slower than RAID 10 with the same number
of drivers, but others suggest that RAID 5 can be much faster that
RAID 10 (see arstechnica.com) (Theoretical performance of RAID 5 is
inline with a RAID 0 stripe of N-1 drives, RAID 10 has only N/2 drives
in a stripe, perfomance should be nearly double - in theory of
course).
35 Trans/sec is pretty slow, particularly if they are only one row at
a time.  I typicaly get 200-400/sec on our DB server on a bad day.  Up
to 1100 on a fresh database.
Well, by putting the pg_xlog directory on a separate disk/partition, I 
was able to increase this rate to about 50 or so per second (still 
pretty far from your numbers). Next I am going to try putting the 
pg_xlog on a RAID1+0 array and see if that helps.

I suggested running a bonnie benchmark, or some other IO perftest to
determine if it's the array itself performing badly, or if there is
something wrong with postgresql.
If the array isn't kicking out at least 50MB/sec read/write
performance, something is wrong.
Until you've isolated the problem to either postgres or the array,
everything else is simply speculation.
In a perfect world, you would have two 6 drive RAID 10s. on two PCI
busses, with system tables on a third parition, and archive logging on
a fourth.  Unsurprisingly this looks alot like the Oracle recommended
minimum config.
Could you please elaborate on this setup a little more? How do you put 
system tables on a separate partition? I am still using version 7, and 
without tablespaces (which is how Oracle controls this), I can't figure 
out how to put different tables on different partitions. Thanks.

Arshavir

Also a note for interest is that this is _software_ raid...
Alex Turner
netEconomist
On 13 Mar 2005 23:36:13 -0500, Greg Stark <[EMAIL PROTECTED]> wrote:
Arshavir Grigorian <[EMAIL PROTECTED]> writes:

Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
Ext3 filesystem which is used by Postgres.
People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
though. It would be interesting to see empirical data.
One thing that does scare me is the Postgres transaction log and the ext3
journal both sharing these disks with the data. Ideally both of these things
should get (mirrored) disks of their own separate from the data files.
But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
flush on every fsync to get the journal pushed out. This is a new linux
feature that's necessary with ide but shouldn't be necessary with scsi.
It would be interesting to know whether postgres performs differently with
fsync=off. This would even be a reasonable mode to run under for initial
database loads. It shouldn't make much of a difference with hardware like this
though. And you should be aware that running under this mode in production
would put your data at risk.
--
greg
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

--
Arshavir Grigorian
Systems Administrator/Engineer
M-CAM, Inc.
[EMAIL PROTECTED]
+1 703-682-0570 ext. 432
Contents Confidential
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Greg Stark

Alex Turner <[EMAIL PROTECTED]> writes:

> a 14 drive stripe will max out the PCI bus long before anything else,

Hopefully anyone with a 14 drive stripe is using some combination of 64 bit
PCI-X cards running at 66Mhz...

> the only reason for a stripe this size is to get a total accessible
> size up.  

Well, many drives also cuts average latency. So even if you have no need for
more bandwidth you still benefit from a lower average response time by adding
more drives.

-- 
greg

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-14 Thread Alex Turner
a 14 drive stripe will max out the PCI bus long before anything else,
the only reason for a stripe this size is to get a total accessible
size up.  A 6 drive RAID 10 on a good controller can get up to
400Mb/sec which is pushing the limit of the PCI bus (taken from
offical 3ware 9500S 8MI benchmarks).  140 drives is not going to beat
6 drives because you've run out of bandwidth on the PCI bus.

The debait on RAID 5 rages onward.  The benchmarks I've seen suggest
that RAID 5 is consistantly slower than RAID 10 with the same number
of drivers, but others suggest that RAID 5 can be much faster that
RAID 10 (see arstechnica.com) (Theoretical performance of RAID 5 is
inline with a RAID 0 stripe of N-1 drives, RAID 10 has only N/2 drives
in a stripe, perfomance should be nearly double - in theory of
course).

35 Trans/sec is pretty slow, particularly if they are only one row at
a time.  I typicaly get 200-400/sec on our DB server on a bad day.  Up
to 1100 on a fresh database.

I suggested running a bonnie benchmark, or some other IO perftest to
determine if it's the array itself performing badly, or if there is
something wrong with postgresql.

If the array isn't kicking out at least 50MB/sec read/write
performance, something is wrong.

Until you've isolated the problem to either postgres or the array,
everything else is simply speculation.

In a perfect world, you would have two 6 drive RAID 10s. on two PCI
busses, with system tables on a third parition, and archive logging on
a fourth.  Unsurprisingly this looks alot like the Oracle recommended
minimum config.

Also a note for interest is that this is _software_ raid...

Alex Turner
netEconomist

On 13 Mar 2005 23:36:13 -0500, Greg Stark <[EMAIL PROTECTED]> wrote:
> 
> Arshavir Grigorian <[EMAIL PROTECTED]> writes:
> 
> > Hi,
> >
> > I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
> > Ext3 filesystem which is used by Postgres.
> 
> People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
> across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
> though. It would be interesting to see empirical data.
> 
> One thing that does scare me is the Postgres transaction log and the ext3
> journal both sharing these disks with the data. Ideally both of these things
> should get (mirrored) disks of their own separate from the data files.
> 
> But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
> flush on every fsync to get the journal pushed out. This is a new linux
> feature that's necessary with ide but shouldn't be necessary with scsi.
> 
> It would be interesting to know whether postgres performs differently with
> fsync=off. This would even be a reasonable mode to run under for initial
> database loads. It shouldn't make much of a difference with hardware like this
> though. And you should be aware that running under this mode in production
> would put your data at risk.
> 
> --
> greg
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5 (possible sync blocking read type issue on 2.6.11)

2005-03-13 Thread David Greaves
Greg Stark wrote:
Arshavir Grigorian <[EMAIL PROTECTED]> writes:
 

Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
Ext3 filesystem which is used by Postgres. 
   

People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
though. It would be interesting to see empirical data.
One thing that does scare me is the Postgres transaction log and the ext3
journal both sharing these disks with the data. Ideally both of these things
should get (mirrored) disks of their own separate from the data files.
But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
flush on every fsync to get the journal pushed out. This is a new linux
feature that's necessary with ide but shouldn't be necessary with scsi.
It would be interesting to know whether postgres performs differently with
fsync=off. This would even be a reasonable mode to run under for initial
database loads. It shouldn't make much of a difference with hardware like this
though. And you should be aware that running under this mode in production
would put your data at risk.
Hi
I'm coming in from the raid list so I didn't get the full story.
May I ask what kernel?
I only ask because I upgraded to 2.6.11.2 and happened to be watching 
xosview on my (probably) completely different setup (1Tb xfs/lvm2/raid5 
served by nfs to a remote sustained read/write app), when I saw all read 
activity cease for 2/3 seconds whilst the disk wrote, then disk read 
resumed. This occured repeatedly during a read/edit/write of a 3Gb file.

Performance not critical here so on the "hmm, that's odd" todo list :)
David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PERFORM] Postgres on RAID5

2005-03-13 Thread Greg Stark

Arshavir Grigorian <[EMAIL PROTECTED]> writes:

> Hi,
> 
> I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
> Ext3 filesystem which is used by Postgres. 

People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
though. It would be interesting to see empirical data.

One thing that does scare me is the Postgres transaction log and the ext3
journal both sharing these disks with the data. Ideally both of these things
should get (mirrored) disks of their own separate from the data files.

But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
flush on every fsync to get the journal pushed out. This is a new linux
feature that's necessary with ide but shouldn't be necessary with scsi.

It would be interesting to know whether postgres performs differently with
fsync=off. This would even be a reasonable mode to run under for initial
database loads. It shouldn't make much of a difference with hardware like this
though. And you should be aware that running under this mode in production
would put your data at risk.

-- 
greg

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Postgres on RAID5

2005-03-11 Thread Arshavir Grigorian
Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
an Ext3 filesystem which is used by Postgres. Currently we are loading a
50G database on this server from a Postgres dump (copy, not insert) and
are experiencing very slow write performance (35 records per second).
Top shows that the Postgres process (postmaster) is being constantly put
into D state for extended periods of time (2-3 seconds) which I assume
is because it's waiting for disk io. I have just started gathering
system statistics and here is what sar -b shows: (this is while the db
is being loaded - pg_restore)
   tpsrtps wtps  bread/s  bwrtn/s
01:35:01 PM275.77 76.12199.66709.59   2315.23
01:45:01 PM287.25 75.56211.69706.52   2413.06
01:55:01 PM281.73 76.35205.37711.84   2389.86
02:05:01 PM282.83 76.14206.69720.85   2418.51
02:15:01 PM284.07 76.15207.92707.38   2443.60
02:25:01 PM265.46 75.91189.55708.87   2089.21
02:35:01 PM285.21 76.02209.19709.58   2446.46
Average:   280.33 76.04204.30710.66   2359.47
This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM.
It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom
compiled kernel. Postgres is installed from the Debian package and uses
all the configuration defaults.
I am also copying the pgsql-performance list.
Thanks in advance for any advice/pointers.
Arshavir
Following is some other info that might be helpful.
/proc/scsi# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
  Creation Time : Wed Feb 23 17:23:41 2005
 Raid Level : raid5
 Array Size : 123823616 (118.09 GiB 126.80 GB)
Device Size : 8844544 (8.43 GiB 9.06 GB)
   Raid Devices : 15
  Total Devices : 17
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Feb 24 10:05:38 2005
  State : active
 Active Devices : 15
Working Devices : 16
 Failed Devices : 1
  Spare Devices : 1
 Layout : left-symmetric
 Chunk Size : 64K
   UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df
 Events : 0.8
Number   Major   Minor   RaidDevice State
   0   8   640  active sync   /dev/sde
   1   8   801  active sync   /dev/sdf
   2   8   962  active sync   /dev/sdg
   3   8  1123  active sync   /dev/sdh
   4   8  1284  active sync   /dev/sdi
   5   8  1445  active sync   /dev/sdj
   6   8  1606  active sync   /dev/sdk
   7   8  1767  active sync   /dev/sdl
   8   8  1928  active sync   /dev/sdm
   9   8  2089  active sync   /dev/sdn
  10   8  224   10  active sync   /dev/sdo
  11   8  240   11  active sync   /dev/sdp
  12  650   12  active sync   /dev/sdq
  13  65   16   13  active sync   /dev/sdr
  14  65   32   14  active sync   /dev/sds
  15  65   48   15  spare   /dev/sdt
# dumpe2fs -h /dev/md1
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   
Last mounted on:  
Filesystem UUID:  1bb95bd6-94c7-4344-adf2-8414cadae6fc
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal dir_index needs_recovery large_file
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  15482880
Block count:  30955904
Reserved block count: 1547795
Free blocks:  28767226
Free inodes:  15482502
First block:  0
Block size:   4096
Fragment size:4096
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 16384
Inode blocks per group:   512
Filesystem created:   Wed Feb 23 17:27:13 2005
Last mount time:  Wed Feb 23 17:45:25 2005
Last write time:  Wed Feb 23 17:45:25 2005
Mount count:  2
Maximum mount count:  28
Last checked: Wed Feb 23 17:27:13 2005
Check interval:   15552000 (6 months)
Next check after: Mon Aug 22 18:27:13 2005
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   128
Journal inode:8
Default directory hash:   tea
Directory Hash Seed:  c35c0226-3b52-4dad-b102-f22feb773592
Journal backup:   inode blocks
# lspci | grep SCSI
:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:04.1 SCSI storag

Postgres on RAID5

2005-03-11 Thread Arshavir Grigorian
Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has 
an Ext3 filesystem which is used by Postgres. Currently we are loading a 
50G database on this server from a Postgres dump (copy, not insert) and 
are experiencing very slow write performance (35 records per second).

Top shows that the Postgres process (postmaster) is being constantly put 
into D state for extended periods of time (2-3 seconds) which I assume 
is because it's waiting for disk io. I have just started gathering 
system statistics and here is what sar -b shows: (this is while the db 
is being loaded - pg_restore)

   tpsrtps wtps  bread/s  bwrtn/s
01:35:01 PM275.77 76.12199.66709.59   2315.23
01:45:01 PM287.25 75.56211.69706.52   2413.06
01:55:01 PM281.73 76.35205.37711.84   2389.86
02:05:01 PM282.83 76.14206.69720.85   2418.51
02:15:01 PM284.07 76.15207.92707.38   2443.60
02:25:01 PM265.46 75.91189.55708.87   2089.21
02:35:01 PM285.21 76.02209.19709.58   2446.46
Average:   280.33 76.04204.30710.66   2359.47
This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM. 
It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom 
compiled kernel. Postgres is installed from the Debian package and uses 
all the configuration defaults.

I am also copying the pgsql-performance list.
Thanks in advance for any advice/pointers.
Arshavir
Following is some other info that might be helpful.
/proc/scsi# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
  Creation Time : Wed Feb 23 17:23:41 2005
 Raid Level : raid5
 Array Size : 123823616 (118.09 GiB 126.80 GB)
Device Size : 8844544 (8.43 GiB 9.06 GB)
   Raid Devices : 15
  Total Devices : 17
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Feb 24 10:05:38 2005
  State : active
 Active Devices : 15
Working Devices : 16
 Failed Devices : 1
  Spare Devices : 1
 Layout : left-symmetric
 Chunk Size : 64K
   UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df
 Events : 0.8
Number   Major   Minor   RaidDevice State
   0   8   640  active sync   /dev/sde
   1   8   801  active sync   /dev/sdf
   2   8   962  active sync   /dev/sdg
   3   8  1123  active sync   /dev/sdh
   4   8  1284  active sync   /dev/sdi
   5   8  1445  active sync   /dev/sdj
   6   8  1606  active sync   /dev/sdk
   7   8  1767  active sync   /dev/sdl
   8   8  1928  active sync   /dev/sdm
   9   8  2089  active sync   /dev/sdn
  10   8  224   10  active sync   /dev/sdo
  11   8  240   11  active sync   /dev/sdp
  12  650   12  active sync   /dev/sdq
  13  65   16   13  active sync   /dev/sdr
  14  65   32   14  active sync   /dev/sds
  15  65   48   15  spare   /dev/sdt
# dumpe2fs -h /dev/md1
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   
Last mounted on:  
Filesystem UUID:  1bb95bd6-94c7-4344-adf2-8414cadae6fc
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal dir_index needs_recovery large_file
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  15482880
Block count:  30955904
Reserved block count: 1547795
Free blocks:  28767226
Free inodes:  15482502
First block:  0
Block size:   4096
Fragment size:4096
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 16384
Inode blocks per group:   512
Filesystem created:   Wed Feb 23 17:27:13 2005
Last mount time:  Wed Feb 23 17:45:25 2005
Last write time:  Wed Feb 23 17:45:25 2005
Mount count:  2
Maximum mount count:  28
Last checked: Wed Feb 23 17:27:13 2005
Check interval:   15552000 (6 months)
Next check after: Mon Aug 22 18:27:13 2005
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   128
Journal inode:8
Default directory hash:   tea
Directory Hash Seed:  c35c0226-3b52-4dad-b102-f22feb773592
Journal backup:   inode blocks
# lspci | grep SCSI
:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)
:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)
:00:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 
(rev 14)
:00: