Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-14 Thread Scott Marlowe
On Tue, Dec 11, 2012 at 6:03 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 12/12/2012 06:44 AM, Evgeny Shishkin wrote:


 On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt
 nielskrist...@autouncle.com wrote:

 Are you using a hardware based raid controller with them?

 Yes, of course. Hardware raid with cache and bbu is a must. You can't get
 fast fsync without it.


 Most SSDs should offer fairly fast fsync without a hardware RAID controller,
 as they do write-back caching. The trick is to find ones that do write-back
 caching safely, so you don't get severe data corruption on power-loss.

 A HW RAID controller is an absolute must for rotating magnetic media,
 though.


 Also mdadm is a pain in the ass and is suitable only on amazon and other
 cloud shit.


 I've personally been pretty happy with mdadm. I find the array portability
 it offers very useful, so I don't need to buy a second RAID controller just
 in case my main controller dies and I need a compatible one to get the array
 running again. If you don't need a BBU for safe write-back caching then
 mdadm has advantages over hardware RAID.

 I'll certainly use mdadm over onboard fakeraid solutions or low-end hardware
 RAID controllers. I suspect a mid- to high end HW RAID unit will generally
 win.

Also for sequential throughput md RAID is usually faster than most
RAID controllers, even the high end Areca and LSI ones.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-13 Thread Jeff Janes
On Wed, Dec 12, 2012 at 8:46 AM, Niels Kristian Schjødt
nielskrist...@autouncle.com wrote:

 Den 11/12/2012 kl. 18.25 skrev Jeff Janes jeff.ja...@gmail.com:

 On Tue, Dec 11, 2012 at 2:04 AM, Niels Kristian Schjødt
 nielskrist...@autouncle.com wrote:

 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?

 What tool do you use to determine that?  Is that on top of the 4GB
 shared_buffers, are including it?

 Okay I might not have made myself clear, I was talking physical memory 
 utilization. Here is the stats:
 free -m
 total   used   free sharedbuffers cached
 Mem: 32075  25554   6520  0 69  22694
 -/+ buffers/cache:   2791  29284
 Swap: 2046595   1451

I don't how you get 5 Gig from that, though.  You have 22 Gig of
cached file-system, which for your purposes probably counts as being
utilized.  Although we don't know how much of this is for postgres
data files, chances are it is a pretty good chunk.



 How big is your entire data set?  Maybe all your data fits in 5GB
 (believable, as all your indexes listed below sum to  2.5GB) so there
 is no need to use more.

 It doesn't we are a search engine for used cars, and there are quite a lot of 
 those out there :-)

But how big?  More than 22GB?  (you can use \l+ in psql, or du -s on
the data directory)

 However, my indexes are almost all partial indexes, which mean that they are 
 only on cars which is still for sale, so in that sense, the indexes them 
 selves doesn't really grow, but the tables do.

So maybe this reverses things.  If your car table is huge and the
active cars are scattered randomly among all the inactive ones, then
updating random active cars is going to generate a lot of randomly
scattered writing which can't be combined into sequential writes.

Do you have plans for archiving cars that are no longer for sale?  Why
do you keep them around in the first place, i.e. what types of queries
do you do on inactive ones?

Unfortunately you currently can't use CLUSTER with partial indexes,
otherwise that might be a good idea.  You could build a full index on
whatever it is you use as the criterion for the partial indexes,
cluster on that, and then drop it.

But the table would eventually become unclustered again, so if this
works you might want to implement partitioning between active and
inactive partitions so as to maintain the clustering.


 You could really crank up shared_buffers or vm.dirty_background_ratio,
 but doing so might cause problems with checkpoints stalling and
 latency spikes.  That would probably not be a problem during the
 night, but could be during the day.

 What do you have in mind here? Tweaking what parameters to what values?

 I'd set shared_buffers to 20GB (or 10GB, if that will hold all of your

 I had that before, Shaun suggested that I changed it to 4GB as he was talking 
 about a strange behavior when larger than that on 12.04. But I can say, that 
 there has not been any notable difference between having it at 4Gb and at 8Gb.

It is almost an all or nothing thing.  If you need 16 or 20GB, just
going from 4 to 8 isn't going to show much difference.  If you can
test this easily, I'd just set it to 24 or even 28GB and run the bulk
update.  I don't think you'd want to run a server permanently at those
settings, but it is an easy way to rule in or out different theories
about what is going on.

 But if your database is growing so rapidly that it soon won't fit on
 240GB, then cranking up shared_buffers won't do for long.  If you can
 get your tables and all of their indexes clustered together, then you
 can do the updates in an order that makes IO more efficient.  Maybe
 partitioning would help.

 Can you explain a little more about this, or provide me a good link?

If all your partial btree indexes are using the same WHERE clause,
then your indexes are already clustered together in a sense--a partial
index is kind of like a composite index with the WHERE clause as the
first column.

So the trick would be to get the table to be clustered on the same
thing--either by partitioning or by the CLUSTER command, or something
equivalent to those.  I don't know of a good link, other than the
documentation (which is more about how to do it, rather than why you
would want to or how to design it)

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-12 Thread Niels Kristian Schjødt
Well, In fact I do (as you can see from my configuration). I have a similar 
server running with hot standby replication - and it runs two 3T HDD in a RAID1 
array.

So - is it still very bad if I choose to put four intel 520 disks in a RAID10 
array on the other production server?

Den 12/12/2012 kl. 03.47 skrev Craig Ringer cr...@2ndquadrant.com:

 On 12/12/2012 10:13 AM, Evgeny Shishkin wrote:
 
 Yes, i am aware of this issue. Never experienced this neither on intel 520, 
 no ocz vertex 3.
 
 
 I wouldn't trust either of those drives. The 520 doesn't have Intel's  
 Enhanced Power Loss Data Protection; it's going to lose its buffers if it 
 loses power. Similarly, the Vertex 3 doesn't have any kind of power 
 protection. See:
 
 http://www.intel.com/content/www/us/en/solid-state-drives/ssd-320-series-power-loss-data-protection-brief.html
 http://ark.intel.com/products/family/56572/Intel-SSD-500-Family
 
 http://www.ocztechnology.com/res/manuals/OCZ_SSD_Breakdown_Q2-11_1.pdf
 
 The only way I'd use those for a production server was if I had synchronous 
 replication running to another machine with trustworthy, durable storage - 
 and if I didn't mind some downtime to restore the corrupt DB from the replica 
 after power loss.
 
 -- 
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-12 Thread Niels Kristian Schjødt

Den 11/12/2012 kl. 18.25 skrev Jeff Janes jeff.ja...@gmail.com:

 On Tue, Dec 11, 2012 at 2:04 AM, Niels Kristian Schjødt
 nielskrist...@autouncle.com wrote:
 Den 11/12/2012 kl. 00.58 skrev Jeff Janes jeff.ja...@gmail.com:
 
 
 The fact that there is much more writing than reading tells me that
 most of your indexes are in RAM.  The amount of index you are rapidly
 reading and dirtying is large enough to fit in RAM, but is not large
 enough to fit in shared_buffers + kernel's dirty-buffer comfort level.
 
 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?
 
 What tool do you use to determine that?  Is that on top of the 4GB
 shared_buffers, are including it?

Okay I might not have made myself clear, I was talking physical memory 
utilization. Here is the stats:
free -m
total   used   free sharedbuffers cached
Mem: 32075  25554   6520  0 69  22694
-/+ buffers/cache:   2791  29284
Swap: 2046595   1451
 
 How big is your entire data set?  Maybe all your data fits in 5GB
 (believable, as all your indexes listed below sum to  2.5GB) so there
 is no need to use more.

It doesn't we are a search engine for used cars, and there are quite a lot of 
those out there :-) However, my indexes are almost all partial indexes, which 
mean that they are only on cars which is still for sale, so in that sense, the 
indexes them selves doesn't really grow, but the tables do.

 
 Or maybe you have hit an bug in the 3.2 kernel.  At least one of those
 has been frequently discussed.
 
Might be true - but likely?
 
 You could really crank up shared_buffers or vm.dirty_background_ratio,
 but doing so might cause problems with checkpoints stalling and
 latency spikes.  That would probably not be a problem during the
 night, but could be during the day.
 
 What do you have in mind here? Tweaking what parameters to what values?
 
 I'd set shared_buffers to 20GB (or 10GB, if that will hold all of your

I had that before, Shaun suggested that I changed it to 4GB as he was talking 
about a strange behavior when larger than that on 12.04. But I can say, that 
there has not been any notable difference between having it at 4Gb and at 8Gb.

 data) and see what happens.  And probably increase checkpoint_timeout
 and checkpoint_segments about 3x each. Also, turn on log_checkpoints
 so you can see what kinds of problem those changes may be causing
 there (i.e. long sync times).  Preferably you do this on some kind of
 pre-production or test server.
 
 But if your database is growing so rapidly that it soon won't fit on
 240GB, then cranking up shared_buffers won't do for long.  If you can
 get your tables and all of their indexes clustered together, then you
 can do the updates in an order that makes IO more efficient.  Maybe
 partitioning would help.

Can you explain a little more about this, or provide me a good link?
 
 
 I don't know how big each disk is, or how big your various categories
 of data are.  Could you move everything to SSD?  Could you move all
 your actively updated indexes there?
 
 With table spaces you mean?
 
 Yes.  Or moving everything to SSD if it fits, then you don't have go
 through and separate objects.
 
 The UPDATE you posted in a previous thread looked like the table
 blocks might also be getting dirtied in a fairly random order, which
 means the table blocks are in the same condition as the index blocks
 so maybe singling out the indexes isn't warranted.
 
 Cheers,
 
 Jeff



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-12 Thread Craig Ringer
On 13/12/2012 12:22 AM, Niels Kristian Schjødt wrote:
 Well, In fact I do (as you can see from my configuration). I have a
 similar server running with hot standby replication - and it runs two
 3T HDD in a RAID1 array.

 So - is it still very bad if I choose to put four intel 520 disks in a
 RAID10 array on the other production server?
So long as you have it recording to a synchronous replia on another
machine and you're fully prepared to accept the small risk that you'll
have total and unrecoverable data corruption on that server, with the
corresponding downtime while you rebuild it from the replica, it should
be OK.

Alternately, you could use PITR with a basebackup to ship WAL to another
machine or a reliable HDD, so you can recover all but the last
checkpoint_timeout minutes of data from the base backup + WAL. There's
small window of data loss that way, but you don't need a second machine
as a streaming replication follower. barman might is worth checking out
as a management tool for PITR backups.

If the data is fairly low-value you could even just take nightly backups
and accept the risk of losing some data.

--
Craig Ringer


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Niels Kristian Schjødt
Den 11/12/2012 kl. 00.58 skrev Jeff Janes jeff.ja...@gmail.com:

 On Mon, Dec 10, 2012 at 2:51 PM, Niels Kristian Schjødt
 nielskrist...@autouncle.com wrote:
 
 synchronous_commit = off
 
 The pg_xlog folder has been moved onto the SSD array (md3), and symlinked
 back into the postgres dir.
 
 With synchronous_commit = off, or with large transactions, there is
 probably no advantage to moving those to SSD.
 
 
 2)
 When the database is loaded like this, I see a lot of queries talking up to
 1000 times as long, as they would when the database is not loaded so
 heavily.
 
 What kinds of queries are they?  single-row look-ups, full table scans, etc.
Well Mostly they are updates. Like the one shown in the previous question I 
referenced.
 
  Notes and thoughts
 ##
 
 As you can see, even though I have moved the pg_xlog folder to the SSD array
 (md3) the by far largest amount of writes still goes to the regular HDD's
 (md2), which puzzles me - what can that be?
 
 Every row you insert or non-HOT update has to do maintenance on all
 indexes of that table.  If the rows are not inserted/updated in index
 order, this means you every row inserted/updated dirties a randomly
 scattered 8KB for each of the indexes.  If you have lots of indexes
 per table, that adds up fast.
 
 The fact that there is much more writing than reading tells me that
 most of your indexes are in RAM.  The amount of index you are rapidly
 reading and dirtying is large enough to fit in RAM, but is not large
 enough to fit in shared_buffers + kernel's dirty-buffer comfort level.
Maybe I should mention, that I never see more than max 5Gb out of my total 32Gb 
being in use on the server… Can I somehow utilize more of it?
 So you are redirtying the same blocks over and over, PG is
 desperately dumping them to the kernel (because shared_buffers it too
 small to hold them) and the kernel is desperately dumping them to
 disk, because vm.dirty_background_ratio is so low.  There is little
 opportunity for write-combining, because they don't sit in memory long
 enough to accumulate neighbors.
 
 How big are your indexes?
This is a size list of all my indexes: 117 MB, 118 MB, 11 MB, 12 MB, 12 MB, 12 
MB, 12 MB, 140 MB, 15 MB, 15 MB, 16 kB, 16 kB, 16 kB, 16 kB, 16 kB, 16 kB, 16 
kB, 16 kB, 16 kB, 16 kB, 16 kB, 16 kB, 16 MB, 16 MB, 176 kB, 176 kB, 17 MB, 18 
MB, 19 MB, 23 MB, 240 kB, 24 MB, 256 kB, 25 MB, 25 MB, 26 MB, 26 MB, 27 MB, 27 
MB, 27 MB, 27 MB, 280 MB, 2832 kB, 2840 kB, 288 kB, 28 MB, 28 MB, 28 MB, 28 MB, 
28 MB, 28 MB, 28 MB, 28 MB, 29 MB, 29 MB, 3152 kB, 3280 kB, 32 kB, 32 MB, 32 
MB, 3352 kB, 3456 kB, 34 MB, 36 MB, 3744 kB, 3776 kB, 37 MB, 37 MB, 3952 kB, 
400 kB, 408 kB, 40 kB, 40 kB, 40 kB, 416 kB, 416 kB, 42 MB, 432 kB, 4520 kB, 
4720 kB, 47 MB, 48 kB, 496 kB, 49 MB, 512 kB, 52 MB, 52 MB, 5304 kB, 5928 kB, 
6088 kB, 61 MB, 6224 kB, 62 MB, 6488 kB, 64 kB, 6512 kB, 71 MB, 72 kB, 72 kB, 
8192 bytes, 8400 kB, 88 MB, 95 MB, 98 MB
 You could really crank up shared_buffers or vm.dirty_background_ratio,
 but doing so might cause problems with checkpoints stalling and
 latency spikes.  That would probably not be a problem during the
 night, but could be during the day.
What do you have in mind here? Tweaking what parameters to what values?
 .
 Rather than moving maintenance to the day and hoping it doesn't
 interfere with normal operations, I'd focus on making night-time
 maintenance more efficient, for example by dropping indexes (either
 just at night, or if some indexes are not useful, just get rid of them
 altogether), or cranking up shared_buffers at night, or maybe
 partitioning or look into pg_bulkload.
 
 From stat 3) (the iostat) I notice that the SSD's doesn't seem to be
 something near fully utilized - maybe something else than just pg_xlog could
 be moved her?
 
 I don't know how big each disk is, or how big your various categories
 of data are.  Could you move everything to SSD?  Could you move all
 your actively updated indexes there?
With table spaces you mean?
 Or, more fundamentally, it looks like you spent too much on CPUs (86%
 idle) and not nearly enough on disks.  Maybe you can fix that for less
 money than it will cost you in your optimization time to make the best
 of the disks you already have.
The SSD's I use a are 240Gb each which will grow too small within a few months 
- so - how does moving the whole data dir onto four of those in a RAID5 array 
sound?
 
 Cheers,
 
 Jeff



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Craig Ringer
On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:

 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?
For an update-mostly workload it probably won't do you tons of good so
long as all your indexes fit in RAM. You're clearly severely
bottlenecked on disk I/O not RAM.
 The SSD's I use a are 240Gb each which will grow too small within a
 few months - so - how does moving the whole data dir onto four of
 those in a RAID5 array sound? 

Not RAID 5!

Use a RAID10 of four or six SSDs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Niels Kristian Schjødt

Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:

 On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?
 For an update-mostly workload it probably won't do you tons of good so
 long as all your indexes fit in RAM. You're clearly severely
 bottlenecked on disk I/O not RAM.
 The SSD's I use a are 240Gb each which will grow too small within a
 few months - so - how does moving the whole data dir onto four of
 those in a RAID5 array sound? 
 
 Not RAID 5!
 
 Use a RAID10 of four or six SSDs.
 
 -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 
Hehe got it - did you have a look at the SSD's I am considering building it of? 
http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
 
Are they suitable do you think?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 11, 2012, at 5:35 PM, Niels Kristian Schjødt 
nielskrist...@autouncle.com wrote:

 
 Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:
 
 On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?
 For an update-mostly workload it probably won't do you tons of good so
 long as all your indexes fit in RAM. You're clearly severely
 bottlenecked on disk I/O not RAM.
 The SSD's I use a are 240Gb each which will grow too small within a
 few months - so - how does moving the whole data dir onto four of
 those in a RAID5 array sound? 
 
 Not RAID 5!
 
 Use a RAID10 of four or six SSDs.
 
 -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 
 Hehe got it - did you have a look at the SSD's I am considering building it 
 of? 
 http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
  
 Are they suitable do you think?
 

I am not Craig, but i use them in production in raid10 array now.

 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Jeff Janes
On Tue, Dec 11, 2012 at 2:04 AM, Niels Kristian Schjødt
nielskrist...@autouncle.com wrote:
 Den 11/12/2012 kl. 00.58 skrev Jeff Janes jeff.ja...@gmail.com:


 The fact that there is much more writing than reading tells me that
 most of your indexes are in RAM.  The amount of index you are rapidly
 reading and dirtying is large enough to fit in RAM, but is not large
 enough to fit in shared_buffers + kernel's dirty-buffer comfort level.

 Maybe I should mention, that I never see more than max 5Gb out of my total 
 32Gb being in use on the server… Can I somehow utilize more of it?

What tool do you use to determine that?  Is that on top of the 4GB
shared_buffers, are including it?

How big is your entire data set?  Maybe all your data fits in 5GB
(believable, as all your indexes listed below sum to  2.5GB) so there
is no need to use more.

Or maybe you have hit an bug in the 3.2 kernel.  At least one of those
has been frequently discussed.


 You could really crank up shared_buffers or vm.dirty_background_ratio,
 but doing so might cause problems with checkpoints stalling and
 latency spikes.  That would probably not be a problem during the
 night, but could be during the day.

 What do you have in mind here? Tweaking what parameters to what values?

I'd set shared_buffers to 20GB (or 10GB, if that will hold all of your
data) and see what happens.  And probably increase checkpoint_timeout
and checkpoint_segments about 3x each. Also, turn on log_checkpoints
so you can see what kinds of problem those changes may be causing
there (i.e. long sync times).  Preferably you do this on some kind of
pre-production or test server.

But if your database is growing so rapidly that it soon won't fit on
240GB, then cranking up shared_buffers won't do for long.  If you can
get your tables and all of their indexes clustered together, then you
can do the updates in an order that makes IO more efficient.  Maybe
partitioning would help.


 I don't know how big each disk is, or how big your various categories
 of data are.  Could you move everything to SSD?  Could you move all
 your actively updated indexes there?

 With table spaces you mean?

Yes.  Or moving everything to SSD if it fits, then you don't have go
through and separate objects.

The UPDATE you posted in a previous thread looked like the table
blocks might also be getting dirtied in a fairly random order, which
means the table blocks are in the same condition as the index blocks
so maybe singling out the indexes isn't warranted.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Niels Kristian Schjødt
And what is your experience so far?
Den 11/12/2012 18.16 skrev Evgeny Shishkin itparan...@gmail.com:


 On Dec 11, 2012, at 5:35 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:

 
  Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:
 
  On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
  Maybe I should mention, that I never see more than max 5Gb out of my
 total 32Gb being in use on the server… Can I somehow utilize more of it?
  For an update-mostly workload it probably won't do you tons of good so
  long as all your indexes fit in RAM. You're clearly severely
  bottlenecked on disk I/O not RAM.
  The SSD's I use a are 240Gb each which will grow too small within a
  few months - so - how does moving the whole data dir onto four of
  those in a RAID5 array sound?
 
  Not RAID 5!
 
  Use a RAID10 of four or six SSDs.
 
  --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
  Hehe got it - did you have a look at the SSD's I am considering building
 it of?
 http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
  Are they suitable do you think?
 

 I am not Craig, but i use them in production in raid10 array now.

 
 
  --
  Sent via pgsql-performance mailing list (
 pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 11, 2012, at 10:54 PM, Niels Kristian Schjødt 
nielskrist...@autouncle.com wrote:

 And what is your experience so far?
 
Increased tps by a factor of 10, database no longer a limiting factor of 
application.
And it is cheaper than brand rotating drives.


 Den 11/12/2012 18.16 skrev Evgeny Shishkin itparan...@gmail.com:
 
 On Dec 11, 2012, at 5:35 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:
 
 
  Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:
 
  On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
  Maybe I should mention, that I never see more than max 5Gb out of my 
  total 32Gb being in use on the server… Can I somehow utilize more of it?
  For an update-mostly workload it probably won't do you tons of good so
  long as all your indexes fit in RAM. You're clearly severely
  bottlenecked on disk I/O not RAM.
  The SSD's I use a are 240Gb each which will grow too small within a
  few months - so - how does moving the whole data dir onto four of
  those in a RAID5 array sound?
 
  Not RAID 5!
 
  Use a RAID10 of four or six SSDs.
 
  --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
  Hehe got it - did you have a look at the SSD's I am considering building it 
  of? 
  http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
  Are they suitable do you think?
 
 
 I am not Craig, but i use them in production in raid10 array now.
 
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Niels Kristian Schjødt
Are you using a hardware based raid controller with them?
Den 11/12/2012 20.11 skrev Evgeny Shishkin itparan...@gmail.com:


 On Dec 11, 2012, at 10:54 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:

 And what is your experience so far?

 Increased tps by a factor of 10, database no longer a limiting factor of
 application.
 And it is cheaper than brand rotating drives.


 Den 11/12/2012 18.16 skrev Evgeny Shishkin itparan...@gmail.com:


 On Dec 11, 2012, at 5:35 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:

 
  Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:
 
  On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
  Maybe I should mention, that I never see more than max 5Gb out of my
 total 32Gb being in use on the server… Can I somehow utilize more of it?
  For an update-mostly workload it probably won't do you tons of good so
  long as all your indexes fit in RAM. You're clearly severely
  bottlenecked on disk I/O not RAM.
  The SSD's I use a are 240Gb each which will grow too small within a
  few months - so - how does moving the whole data dir onto four of
  those in a RAID5 array sound?
 
  Not RAID 5!
 
  Use a RAID10 of four or six SSDs.
 
  --
  Craig Ringer   
  http://www.2ndQuadrant.com/http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
  Hehe got it - did you have a look at the SSD's I am considering
 building it of?
 http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
  Are they suitable do you think?
 

 I am not Craig, but i use them in production in raid10 array now.

 
 
  --
  Sent via pgsql-performance mailing list (
 pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance





Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt 
nielskrist...@autouncle.com wrote:

 Are you using a hardware based raid controller with them?
 
Yes, of course. Hardware raid with cache and bbu is a must. You can't get fast 
fsync without it.
Also mdadm is a pain in the ass and is suitable only on amazon and other cloud 
shit.

 Den 11/12/2012 20.11 skrev Evgeny Shishkin itparan...@gmail.com:
 
 On Dec 11, 2012, at 10:54 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:
 
 And what is your experience so far?
 
 Increased tps by a factor of 10, database no longer a limiting factor of 
 application.
 And it is cheaper than brand rotating drives.
 
 
 Den 11/12/2012 18.16 skrev Evgeny Shishkin itparan...@gmail.com:
 
 On Dec 11, 2012, at 5:35 PM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:
 
 
  Den 11/12/2012 kl. 14.29 skrev Craig Ringer cr...@2ndquadrant.com:
 
  On 12/11/2012 06:04 PM, Niels Kristian Schjødt wrote:
 
  Maybe I should mention, that I never see more than max 5Gb out of my 
  total 32Gb being in use on the server… Can I somehow utilize more of it?
  For an update-mostly workload it probably won't do you tons of good so
  long as all your indexes fit in RAM. You're clearly severely
  bottlenecked on disk I/O not RAM.
  The SSD's I use a are 240Gb each which will grow too small within a
  few months - so - how does moving the whole data dir onto four of
  those in a RAID5 array sound?
 
  Not RAID 5!
 
  Use a RAID10 of four or six SSDs.
 
  --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
  Hehe got it - did you have a look at the SSD's I am considering building 
  it of? 
  http://ark.intel.com/products/66250/Intel-SSD-520-Series-240GB-2_5in-SATA-6Gbs-25nm-MLC
  Are they suitable do you think?
 
 
 I am not Craig, but i use them in production in raid10 array now.
 
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 
 



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Craig Ringer
On 12/12/2012 06:44 AM, Evgeny Shishkin wrote:

 On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt
 nielskrist...@autouncle.com mailto:nielskrist...@autouncle.com wrote:

 Are you using a hardware based raid controller with them?

 Yes, of course. Hardware raid with cache and bbu is a must. You can't
 get fast fsync without it.

Most SSDs should offer fairly fast fsync without a hardware RAID
controller, as they do write-back caching. The trick is to find ones
that do write-back caching safely, so you don't get severe data
corruption on power-loss.

A HW RAID controller is an absolute must for rotating magnetic media,
though.

 Also mdadm is a pain in the ass and is suitable only on amazon and
 other cloud shit.

I've personally been pretty happy with mdadm. I find the array
portability it offers very useful, so I don't need to buy a second RAID
controller just in case my main controller dies and I need a compatible
one to get the array running again. If you don't need a BBU for safe
write-back caching then mdadm has advantages over hardware RAID.

I'll certainly use mdadm over onboard fakeraid solutions or low-end
hardware RAID controllers. I suspect a mid- to high end HW RAID unit
will generally win.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 5:03 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 12/12/2012 06:44 AM, Evgeny Shishkin wrote:
 
 On Dec 12, 2012, at 2:41 AM, Niels Kristian Schjødt 
 nielskrist...@autouncle.com wrote:
 
 Are you using a hardware based raid controller with them?
 
 Yes, of course. Hardware raid with cache and bbu is a must. You can't get 
 fast fsync without it.
 
 Most SSDs should offer fairly fast fsync without a hardware RAID controller, 
 as they do write-back caching. The trick is to find ones that do write-back 
 caching safely, so you don't get severe data corruption on power-loss. 
 

Actually most of low-end SSDs don't do write caching, they do not have enough 
ram for that. Sandforce for example.

 A HW RAID controller is an absolute must for rotating magnetic media, though.
 
 Also mdadm is a pain in the ass and is suitable only on amazon and other 
 cloud shit.
 
 I've personally been pretty happy with mdadm. I find the array portability it 
 offers very useful, so I don't need to buy a second RAID controller just in 
 case my main controller dies and I need a compatible one to get the array 
 running again. If you don't need a BBU for safe write-back caching then mdadm 
 has advantages over hardware RAID.
 

If we are talking about dedicated machine for database with ssd drives, why 
would anybody don't by hardware raid for about 500-700$?  

 I'll certainly use mdadm over onboard fakeraid solutions or low-end hardware 
 RAID controllers. I suspect a mid- to high end HW RAID unit will generally 
 win.
 

 -- 
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Craig Ringer
On 12/12/2012 09:17 AM, Evgeny Shishkin wrote:

 Actually most of low-end SSDs don't do write caching, they do not have
 enough ram for that. Sandforce for example.

Or, worse, some of them do limited write caching but don't protect their
write cache from power loss. Instant data corruption!

I would be extremely reluctant to use low-end SSDs for a database server.

 If we are talking about dedicated machine for database with ssd
 drives, why would anybody don't by hardware raid for about 500-700$?
I'd want to consider whether the same money is better spent on faster,
higher quality SSDs with their own fast write caches.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 5:29 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 12/12/2012 09:17 AM, Evgeny Shishkin wrote:
 
 Actually most of low-end SSDs don't do write caching, they do not have 
 enough ram for that. Sandforce for example.
 
 Or, worse, some of them do limited write caching but don't protect their 
 write cache from power loss. Instant data corruption!
 
 I would be extremely reluctant to use low-end SSDs for a database server.
 
 If we are talking about dedicated machine for database with ssd drives, why 
 would anybody don't by hardware raid for about 500-700$?
 I'd want to consider whether the same money is better spent on faster, higher 
 quality SSDs with their own fast write caches.
 

High quality ssd costs 5-7$ per GB. Consumer grade ssd - 1$. Highend - 11$
New intel dc s3700 2-3$ per GB as far as i remember.

So far, more than a year already, i bought consumer ssds with 300-400$ hw raid. 
Cost effective and fast, may be not very safe, but so far so good. All data 
protection measures from postgresql are on, of course.
 -- 
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Rosser Schwarz
On Tue, Dec 11, 2012 at 5:17 PM, Evgeny Shishkin itparan...@gmail.comwrote:

 Actually most of low-end SSDs don't do write caching, they do not have
 enough ram for that.


AIUI, *all* SSDs do write-caching of a sort: writes are actually flushed to
the NAND media by erasing, and then overwriting the erased space, and
erasing is done in fixed-size blocks, usually much larger than a
filesystem's pages.  The drive's controller accumulates writes in an
on-board cache until it has an erase block's worth of them, which are
then flushed.  From casual searching, a common erase block size is 256
kbytes, while filesystem-level pages are usually 4k.

Most low-end (and even many mid-range) SSDs, including Sandforce-based
drives, don't offer any form of protection (e.g., supercaps, as featured on
the Intel 320 and 710-series drives) for the data in that write cache,
however, which may be what you're thinking of.  I wouldn't let one of those
anywhere near one of my servers, unless it was a completely disposable,
load-balanced slave, and probably not even then.

rls

-- 
:wq


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Craig Ringer
On 12/12/2012 09:44 AM, Evgeny Shishkin wrote:
 So far, more than a year already, i bought consumer ssds with 300-400$
 hw raid. Cost effective and fast, may be not very safe, but so far so
 good. All data protection measures from postgresql are on, of course.

You're aware that many low end SSDs lie to the RAID controller about
having written data, right? Even if the RAID controller sends a flush
command, the SSD might cache the write in non-durable cache. If you're
using such SSDs and you lose power, data corruption is extremely likely,
because your SSDs are essentially ignoring fsync.

Your RAID controller's BBU won't save you, because once the disks tell
the RAID controller the data has hit durable storage, the RAID
controller feels free to flush it from its battery backed cache. If the
disks are lying...

The only solid way to find out if this is an issue with your SSDs is to
do plug-pull testing and find out.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 6:02 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 12/12/2012 09:44 AM, Evgeny Shishkin wrote:
 So far, more than a year already, i bought consumer ssds with 300-400$ hw 
 raid. Cost effective and fast, may be not very safe, but so far so good. All 
 data protection measures from postgresql are on, of course.
 
 You're aware that many low end SSDs lie to the RAID controller about having 
 written data, right? Even if the RAID controller sends a flush command, the 
 SSD might cache the write in non-durable cache. If you're using such SSDs and 
 you lose power, data corruption is extremely likely, because your SSDs are 
 essentially ignoring fsync.
 
 Your RAID controller's BBU won't save you, because once the disks tell the 
 RAID controller the data has hit durable storage, the RAID controller feels 
 free to flush it from its battery backed cache. If the disks are lying...
 
 The only solid way to find out if this is an issue with your SSDs is to do 
 plug-pull testing and find out.
 

Yes, i am aware of this issue. Never experienced this neither on intel 520, no 
ocz vertex 3.
Have you heard of them on this list?
 -- 
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:13 PM, Evgeny Shishkin wrote:
Yes, i am aware of this issue. Never experienced this neither on intel 
520, no ocz vertex 3.

Have you heard of them on this list?
People have done plug-pull tests and reported the results on the list 
(sometime in the past couple of years).


But you don't need to do the test to know these drives are not safe. 
They're unsafe by design.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 6:15 AM, David Boreham david_l...@boreham.org wrote:

 On 12/11/2012 7:13 PM, Evgeny Shishkin wrote:
 Yes, i am aware of this issue. Never experienced this neither on intel 520, 
 no ocz vertex 3.
 Have you heard of them on this list?
 People have done plug-pull tests and reported the results on the list 
 (sometime in the past couple of years).
 
 But you don't need to do the test to know these drives are not safe. They're 
 unsafe by design.
 

Oh, there is no 100% safe system. The only way to be sure is to read data back.
Everything about system design is tradeoff between cost and risks.

 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:20 PM, Evgeny Shishkin wrote:

Oh, there is no 100% safe system.
In this case we're discussing specifically safety in the event of power 
loss shortly after the drive indicates to the controller that it has 
committed a write operation. Some drives do provide 100% safety against 
this event, and they don't cost much more than those that don't.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 6:26 AM, David Boreham david_l...@boreham.org wrote:

 On 12/11/2012 7:20 PM, Evgeny Shishkin wrote:
 Oh, there is no 100% safe system.
 In this case we're discussing specifically safety in the event of power loss 
 shortly after the drive indicates to the controller that it has committed a 
 write operation. Some drives do provide 100% safety against this event, and 
 they don't cost much more than those that don't.

Which drives would you recommend? Besides intel 320 and 710.


 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:38 PM, Evgeny Shishkin wrote:

Which drives would you recommend? Besides intel 320 and 710.
Those are the only drive types we have deployed in servers at present 
(almost all 710, but we have some 320 for less mission-critical 
machines). The new DC-S3700 Series looks nice too, but isn't yet in the 
sales channel :

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Craig Ringer
On 12/12/2012 10:13 AM, Evgeny Shishkin wrote:

 Yes, i am aware of this issue. Never experienced this neither on intel
 520, no ocz vertex 3.


I wouldn't trust either of those drives. The 520 doesn't have Intel's 
Enhanced Power Loss Data Protection; it's going to lose its buffers if
it loses power. Similarly, the Vertex 3 doesn't have any kind of power
protection. See:

http://www.intel.com/content/www/us/en/solid-state-drives/ssd-320-series-power-loss-data-protection-brief.html
http://ark.intel.com/products/family/56572/Intel-SSD-500-Family

http://www.ocztechnology.com/res/manuals/OCZ_SSD_Breakdown_Q2-11_1.pdf

The only way I'd use those for a production server was if I had
synchronous replication running to another machine with trustworthy,
durable storage - and if I didn't mind some downtime to restore the
corrupt DB from the replica after power loss.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 6:41 AM, David Boreham david_l...@boreham.org wrote:

 On 12/11/2012 7:38 PM, Evgeny Shishkin wrote:
 Which drives would you recommend? Besides intel 320 and 710.
 Those are the only drive types we have deployed in servers at present (almost 
 all 710, but we have some 320 for less mission-critical machines). The new 
 DC-S3700 Series looks nice too, but isn't yet in the sales channel :
 http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

Yeah, s3700 looks promising, but sata interface is limiting factor for this 
drive.
I'm looking towards SMART ssd 
http://www.storagereview.com/smart_storage_systems_optimus_sas_enterprise_ssd_review

but i don't heard of it anywhere else.


 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Mark Kirkwood

On 12/12/12 15:41, David Boreham wrote:

On 12/11/2012 7:38 PM, Evgeny Shishkin wrote:

Which drives would you recommend? Besides intel 320 and 710.
Those are the only drive types we have deployed in servers at present 
(almost all 710, but we have some 320 for less mission-critical 
machines). The new DC-S3700 Series looks nice too, but isn't yet in 
the sales channel :
http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html 








+1

The s3700 is probably the one to get (when it is available). I'd opt for 
the 710 if you need something now. I'd avoid the 320 - we have 
encountered the firmware bug whereby you get an 8MB (yes 8MB) capacity 
after powerdown with a depressingly large number of them (they were 
updated to the latest firmware too).


Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 7:49 PM, Evgeny Shishkin wrote:
Yeah, s3700 looks promising, but sata interface is limiting factor for 
this drive.
I'm looking towards SMART ssd 
http://www.storagereview.com/smart_storage_systems_optimus_sas_enterprise_ssd_review



What don't you like about SATA ?

I prefer to avoid SAS drives if possible due to the price premium for 
dubious benefits besides vague hand-waving enterprise-ness promises.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread Evgeny Shishkin

On Dec 12, 2012, at 7:05 AM, David Boreham david_l...@boreham.org wrote:

 On 12/11/2012 7:49 PM, Evgeny Shishkin wrote:
 Yeah, s3700 looks promising, but sata interface is limiting factor for this 
 drive.
 I'm looking towards SMART ssd 
 http://www.storagereview.com/smart_storage_systems_optimus_sas_enterprise_ssd_review
 
 What don't you like about SATA ?
 
 I prefer to avoid SAS drives if possible due to the price premium for dubious 
 benefits besides vague hand-waving enterprise-ness promises.
 

Quoting 
http://www.storagereview.com/intel_ssd_dc_s3700_series_enterprise_ssd_review

Intel makes the case that the S3700 is the ideal drive for entry, mainstream 
and performance enterprise computing including HPC use cases. The claim is 
bold, largely because of the decision to go with a SATA interface, which has 
several limitations in the enterprise. The SATA interface tops out at a queue 
depth 32 (SAS scales as high as 256 in most cases) which means that when 
requests go above that level average and peak latency spike as we saw in all of 
our workloads.

Another huge advantage of SAS is the ability to offer dual-port modes for high 
availability scenarios, where there are two controllers interfacing with the 
same drive at the same time. In the event one goes offline, the connection with 
the SSD is not lost, as it would with a standard SATA interface without 
additional hardware. Some SAS drives also offer wide-port configurations used 
to increase total bandwidth above a single-link connection. While the Intel SSD 
DC S3700 against other SATA competitors is very fast, the story changes when 
you introduce the latest MLC and SLC-based SAS SSDs, which can cope better with 
increased thread and queue levels.

We picked the primary post-preconditioning sections of our benchmarks after 
each SSD had reached steady-state. For the purposes of this section, we added 
the Intel SSD DC S3700 onto the throughput charts of the newest SAS 
high-performance SSDs. There are also significant latency differences at higher 
queue depths that play a significant factor, but for the sake of easy 
comparison we stick with raw I/O speed across varying thread and queue counts.

In a 100% 4K random write or random read scenario, the Intel SSD DC 3700 
performs quite well up against the high-end SAS competition, with the second 
fastest 4K steady-state speed. When you switch focus to read throughput at a 
heavy 16T/16Q load it only offers 1/2 to 1/3 the performance of SSDs in this 
category.

http://www.storagereview.com/images/intel_ssd_dc_s3700_main_slc_4kwrite_throughput.png


 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-11 Thread David Boreham

On 12/11/2012 8:11 PM, Evgeny Shishkin wrote:


Quoting 
http://www.storagereview.com/intel_ssd_dc_s3700_series_enterprise_ssd_review

Heh. A fine example of the kind of hand-waving of which I spoke ;)

Higher performance is certainly a benefit, although at present we can't 
saturate even a single 710 series drive (the application, CPU, OS, etc 
is the bottleneck). Similarly while dual-porting certainly has its uses, 
it is not something I need.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-10 Thread Evgeny Shishkin

On Dec 11, 2012, at 2:51 AM, Niels Kristian Schjødt 
nielskrist...@autouncle.com wrote:

  Pitch 
 ##
 I previously posted this question 
 http://archives.postgresql.org/pgsql-performance/2012-11/msg00289.php about a 
 performance issue with an update query. 
 The question evolved into a more general discussion about my setup, and about 
 a lot of I/O wait that I was encountering. Since then, I have gotten a whole 
 lot more familiar with measuring things, and now I just need some 
 experienced eyes to judge which direction I should go in - do I have a 
 hardware issue, or a software issue - and what action should I take?
 
 #  My setup 
 #
 The use case:
 At night time we are doing a LOT of data maintenance, and hence the load on 
 the database is very different from the day time. However we would like to be 
 able to do some of it in the daytime, it's simply just too heavy on the 
 database as is right now. The stats shown below is from one of those heavy 
 load times.
 
 Hardware: 
   - 32Gb ram 
   - 8 core Xeon E3-1245 processor
   - Two SEAGATE ST33000650NS drives (called sdc and sdd in the stats) in a 
 softeware RAID1 array (called md2 in the stats)
   - Two INTEL SSDSC2CW240A3 SSD drives (called sda and sdb in the stats) in a 
 software RAID1 (called md3 in the stats)
 
 Software:
 Postgres 9.2 running on 64bit ubuntu 12.04 with kernel 3.2
 
 Configuration:
 # postgresql.conf (a shortlist of everything changed from the default)
 data_directory = '/var/lib/postgresql/9.2/main'
 hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'
 ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'
 external_pid_file = '/var/run/postgresql/9.2-main.pid'
 listen_addresses = '192.168.0.2, localhost'
 port = 5432
 max_connections = 300
 unix_socket_directory = '/var/run/postgresql'
 wal_level = hot_standby
 synchronous_commit = off
 archive_mode = on
 archive_command = 'rsync -a %p 
 postgres@192.168.0.4:/var/lib/postgresql/9.2/wals/%f /dev/null'
 max_wal_senders = 1
 wal_keep_segments = 32
 log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
 datestyle = 'iso, mdy'
 lc_monetary = 'en_US.UTF-8'
 lc_numeric = 'en_US.UTF-8'
 lc_time = 'en_US.UTF-8'
 default_text_search_config = 'pg_catalog.english'
 default_statistics_target = 100
 maintenance_work_mem = 1GB
 checkpoint_completion_target = 0.9
 effective_cache_size = 22GB
 work_mem = 160MB
 wal_buffers = 4MB
 checkpoint_segments = 100
 shared_buffers = 4GB
 checkpoint_timeout = 10min
 
 The kernel has bee tweaked like so:
 vm.dirty_ratio = 10
 vm.dirty_background_ratio = 1
 kernel.shmmax = 8589934592
 kernel.shmall = 17179869184
 
 The pg_xlog folder has been moved onto the SSD array (md3), and symlinked 
 back into the postgres dir.
 

Actually, you should move xlog to rotating drives, since wal logs written 
sequentially, and everything else to ssd, because of random io pattern.


 # The stats 
 ###
 These are the typical observations/stats I see in one of these periods:
 
 1)
 At top level this is what I see in new relic:
 https://rpm.newrelic.com/public/charts/6ewGRle6bmc
 
 2)
 When the database is loaded like this, I see a lot of queries talking up to 
 1000 times as long, as they would when the database is not loaded so heavily.
 
 3)
 sudo iostat -dmx (typical usage)
 Linux 3.2.0-33-generic (master-db)12/10/2012  _x86_64_(8 CPU)
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
 avgqu-sz   await r_await w_await  svctm  %util
 sda   0.00 6.523.59   26.61 0.22 0.7465.49
  0.010.400.770.35   0.14   0.43
 sdb   0.00 8.310.03   28.38 0.00 0.9769.63
  0.010.520.270.52   0.15   0.43
 sdc   1.7146.01   34.83  116.62 0.56 4.0662.47
  1.90   12.57   21.819.81   1.89  28.66
 sdd   1.6746.14   34.89  116.49 0.56 4.0662.46
  1.58   10.43   21.667.07   1.89  28.60
 md1   0.00 0.000.000.00 0.00 0.00 2.69
  0.000.000.000.00   0.00   0.00
 md0   0.00 0.000.110.24 0.00 0.00 8.00
  0.000.000.000.00   0.00   0.00
 md2   0.00 0.00   72.99  161.95 1.11 4.0645.10
  0.000.000.000.00   0.00   0.00
 md3   0.00 0.000.05   32.32 0.00 0.7447.00
  0.000.000.000.00   0.00   0.00
 
 3)
 sudo iotop -oa (running for about a minute or so)
 TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IOCOMMAND
   292be/4 root   0.00 B  0.00 B0.00 % 99.33 % 
 [md2_raid1]
  2815  be/4 postgres 19.51 M 25.90 M  0.00 % 45.49 % postgres: 
 

Re: [PERFORM] Do I have a hardware or a software problem?

2012-12-10 Thread Jeff Janes
On Mon, Dec 10, 2012 at 2:51 PM, Niels Kristian Schjødt
nielskrist...@autouncle.com wrote:

 synchronous_commit = off

 The pg_xlog folder has been moved onto the SSD array (md3), and symlinked
 back into the postgres dir.

With synchronous_commit = off, or with large transactions, there is
probably no advantage to moving those to SSD.


 2)
 When the database is loaded like this, I see a lot of queries talking up to
 1000 times as long, as they would when the database is not loaded so
 heavily.

What kinds of queries are they?  single-row look-ups, full table scans, etc.



  Notes and thoughts
 ##

 As you can see, even though I have moved the pg_xlog folder to the SSD array
 (md3) the by far largest amount of writes still goes to the regular HDD's
 (md2), which puzzles me - what can that be?

Every row you insert or non-HOT update has to do maintenance on all
indexes of that table.  If the rows are not inserted/updated in index
order, this means you every row inserted/updated dirties a randomly
scattered 8KB for each of the indexes.  If you have lots of indexes
per table, that adds up fast.

The fact that there is much more writing than reading tells me that
most of your indexes are in RAM.  The amount of index you are rapidly
reading and dirtying is large enough to fit in RAM, but is not large
enough to fit in shared_buffers + kernel's dirty-buffer comfort level.
 So you are redirtying the same blocks over and over, PG is
desperately dumping them to the kernel (because shared_buffers it too
small to hold them) and the kernel is desperately dumping them to
disk, because vm.dirty_background_ratio is so low.  There is little
opportunity for write-combining, because they don't sit in memory long
enough to accumulate neighbors.

How big are your indexes?

You could really crank up shared_buffers or vm.dirty_background_ratio,
but doing so might cause problems with checkpoints stalling and
latency spikes.  That would probably not be a problem during the
night, but could be during the day.

Rather than moving maintenance to the day and hoping it doesn't
interfere with normal operations, I'd focus on making night-time
maintenance more efficient, for example by dropping indexes (either
just at night, or if some indexes are not useful, just get rid of them
altogether), or cranking up shared_buffers at night, or maybe
partitioning or look into pg_bulkload.

 From stat 3) (the iostat) I notice that the SSD's doesn't seem to be
 something near fully utilized - maybe something else than just pg_xlog could
 be moved her?

I don't know how big each disk is, or how big your various categories
of data are.  Could you move everything to SSD?  Could you move all
your actively updated indexes there?

Or, more fundamentally, it looks like you spent too much on CPUs (86%
idle) and not nearly enough on disks.  Maybe you can fix that for less
money than it will cost you in your optimization time to make the best
of the disks you already have.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance