[PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling


I'm trying to eke a little bit more performance out of an application, and 
I was wondering if there was a better way to do the following:


I am trying to retrieve, for many sets of rows grouped on a couple of 
fields, the value of an ungrouped field where the row has the highest 
value in another ungrouped field. For instance, I have the following table 
setup:


group  | whatever type
value  | whatever type
number | int
Index: group

I then have rows like this:

group | value | number
-
Foo   | foo   | 1
Foo   | turnips   | 2
Bar   | albatross | 3
Bar   | monkey| 4

I want to receive results like this:

group | value
---
Foo   | turnips
Bar   | monkey

Currently, I do this in my application by ordering by the number and only 
using the last value. I imagine that this is something that can be done in 
the new Postgres 9, with a sorted group by - something like this:


SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group

Is this something that is already built in, or would I have to write my 
own LAST aggregate function?


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

No. It's built in (8.4) and it's called Windowing functions:
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresql.org/docs/8.4/static/functions-window.html

SELECT group, last_value(value) over(ORDER BY number)
FROM table


I may be mistaken, but as I understand it, a windowing function doesn't 
reduce the number of rows in the results?


Matthew

--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats. -- Howard Aiken

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thom Brown
On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote:
 On Tue, 10 Aug 2010, Thomas Kellerer wrote:

 No. It's built in (8.4) and it's called Windowing functions:
 http://www.postgresql.org/docs/8.4/static/tutorial-window.html
 http://www.postgresql.org/docs/8.4/static/functions-window.html

 SELECT group, last_value(value) over(ORDER BY number)
 FROM table

 I may be mistaken, but as I understand it, a windowing function doesn't
 reduce the number of rows in the results?


I think you are mistaken.  The last_value function is a window
function aggregate.  Give it a try.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread hubert depesz lubaczewski
On Tue, Aug 10, 2010 at 04:40:16PM +0100, Matthew Wakeling wrote:
 
 I'm trying to eke a little bit more performance out of an
 application, and I was wondering if there was a better way to do the
 following:
 
 I am trying to retrieve, for many sets of rows grouped on a couple
 of fields, the value of an ungrouped field where the row has the
 highest value in another ungrouped field. For instance, I have the
 following table setup:
 
 group  | whatever type
 value  | whatever type
 number | int
 Index: group
 
 I then have rows like this:
 
 group | value | number
 -
 Foo   | foo   | 1
 Foo   | turnips   | 2
 Bar   | albatross | 3
 Bar   | monkey| 4
 
 I want to receive results like this:
 
 group | value
 ---
 Foo   | turnips
 Bar   | monkey
 
 Currently, I do this in my application by ordering by the number and
 only using the last value. I imagine that this is something that can
 be done in the new Postgres 9, with a sorted group by - something
 like this:
 
 SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group
 
 Is this something that is already built in, or would I have to write
 my own LAST aggregate function?

this is trivially done when usign 'distinct on':
select distinct on (group) *
from table
order by group desc, number desc;

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Greg Smith

Scott Carey wrote:
Also, the amount of data at risk in a power loss varies between 
drives.  For Intel's drives, its a small chunk of data (  256K).  For 
some other drives, the cache can be over 30MB of outstanding writes.

For some workloads this is acceptable


No, it isn't ever acceptable.  You can expect the type of data loss you 
get when a cache fails to honor write flush calls results in 
catastrophic database corruption.  It's not I lost the last few 
seconds; it's the database is corrupted and won't start after a 
crash.  This is why we pound on this topic on this list.  A SSD that 
fails to honor flush requests is completely worthless for anything other 
than toy databases.  You can expect significant work to recover any 
portion of your data after the first unexpected power loss under heavy 
write load in this environment, during which you're down.  We do 
database corruption recovery at 2ndQuadrant; while I can't talk about 
the details of some recent incidents, I am not speaking theoretically 
when I warn about this.


Michael, I would suggest you read 
http://www.postgresql.org/docs/current/static/wal-reliability.html and 
link to it at the end of your article.  You are recommending that people 
consider a configuration that will result in their data being lost.  
That can be acceptable, if for example your data is possible to recreate 
from backups or the like.  But people should be extremely clear that 
trade-off is happening, and your blog post is not doing that yet.  Part 
of the reason for the bang per buck you're seeing here is that cheap 
SSDs are cheating.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thomas Kellerer

Matthew Wakeling wrote on 10.08.2010 18:03:

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

No. It's built in (8.4) and it's called Windowing functions:
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresql.org/docs/8.4/static/functions-window.html

SELECT group, last_value(value) over(ORDER BY number)
FROM table


I may be mistaken, but as I understand it, a windowing function doesn't
reduce the number of rows in the results?


Yes you are right, a bit too quick on my side ;)

But this might be what you are after then:

select group_, value_
from (
  select group_, value_, number_, row_number() over (partition by group_ order 
by value_ desc) as row_num
  from numbers
) t
where row_num = 1
order by group_ desc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Greg Smith

Jeff Davis wrote:

Depending on which 256K you lose, you might as well lose your entire
database.
  


Let's be nice and assume that you only lose one 8K block because of the 
SSD write cache; that's not so bad, right?  Guess what--you could easily 
be the next lucky person who discovers the block corrupted is actually 
in the middle of the pg_class system catalog, where the list of tables 
in the database is at!  Enjoy getting your data back again with that 
piece missing.  It's really a fun time, I'll tell you that.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Thom Brown
On 10 August 2010 17:06, Thom Brown t...@linux.com wrote:
 On 10 August 2010 17:03, Matthew Wakeling matt...@flymine.org wrote:
 On Tue, 10 Aug 2010, Thomas Kellerer wrote:

 No. It's built in (8.4) and it's called Windowing functions:
 http://www.postgresql.org/docs/8.4/static/tutorial-window.html
 http://www.postgresql.org/docs/8.4/static/functions-window.html

 SELECT group, last_value(value) over(ORDER BY number)
 FROM table

 I may be mistaken, but as I understand it, a windowing function doesn't
 reduce the number of rows in the results?


 I think you are mistaken.  The last_value function is a window
 function aggregate.  Give it a try.


D'oh, no, I'm mistaken.  My brain has been malfunctioning today.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 
 I'm trying to eke a little bit more performance out of an
 application
 
In addition to the suggestion from Thomas Kellerer, it would be
interesting to try the following and see how performance compares
using real data.
 
select group, value from tbl x
  where not exists
(select * from tbl y
  where y.group = x.group and y.number  x.number);
 
We have a lot of code using this general technique, and I'm curious
whether there are big gains to be had by moving to the windowing
functions.  (I suspect there are.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Jonathan Blitz
 
Another couple of possible ways:

Select groupfield,value
From tbl x1
Where number = (select max(number) from tbl x2 where x2.groupfield=
x1.groupfield)



Select groupfield,value
From tbl x1
Where (groupfield,number) in (select groupfield,max(number) from tbl group
by groupfield)

Which is quickest?
Probably best to try out and see.

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Kevin Grittner
Sent: Tuesday, August 10, 2010 7:38 PM
To: Matthew Wakeling; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Sorted group by

Matthew Wakeling matt...@flymine.org wrote:
 
 I'm trying to eke a little bit more performance out of an application
 
In addition to the suggestion from Thomas Kellerer, it would be interesting
to try the following and see how performance compares using real data.
 
select group, value from tbl x
  where not exists
(select * from tbl y
  where y.group = x.group and y.number  x.number);
 
We have a lot of code using this general technique, and I'm curious whether
there are big gains to be had by moving to the windowing functions.  (I
suspect there are.)
 
-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3061 - Release Date: 08/09/10
21:35:00


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson

 On 8/10/2010 12:21 PM, Greg Smith wrote:

Scott Carey wrote:
Also, the amount of data at risk in a power loss varies between 
drives.  For Intel's drives, its a small chunk of data (  256K).  
For some other drives, the cache can be over 30MB of outstanding writes.

For some workloads this is acceptable


No, it isn't ever acceptable.  You can expect the type of data loss 
you get when a cache fails to honor write flush calls results in 
catastrophic database corruption.  It's not I lost the last few 
seconds; it's the database is corrupted and won't start after a 
crash.  This is why we pound on this topic on this list.  A SSD that 
fails to honor flush requests is completely worthless for anything 
other than toy databases.  You can expect significant work to recover 
any portion of your data after the first unexpected power loss under 
heavy write load in this environment, during which you're down.  We do 
database corruption recovery at 2ndQuadrant; while I can't talk about 
the details of some recent incidents, I am not speaking theoretically 
when I warn about this.




What about putting indexes on them?  If the drive fails and drops writes 
on those, they could be rebuilt - assuming your system can function 
without the index(es) temporarily.



--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Karl Denninger
Brad Nicholson wrote:
  On 8/10/2010 12:21 PM, Greg Smith wrote:
 Scott Carey wrote:
 Also, the amount of data at risk in a power loss varies between
 drives.  For Intel's drives, its a small chunk of data (  256K). 
 For some other drives, the cache can be over 30MB of outstanding
 writes.
 For some workloads this is acceptable

 No, it isn't ever acceptable.  You can expect the type of data loss
 you get when a cache fails to honor write flush calls results in
 catastrophic database corruption.  It's not I lost the last few
 seconds; it's the database is corrupted and won't start after a
 crash.  This is why we pound on this topic on this list.  A SSD that
 fails to honor flush requests is completely worthless for anything
 other than toy databases.  You can expect significant work to recover
 any portion of your data after the first unexpected power loss under
 heavy write load in this environment, during which you're down.  We
 do database corruption recovery at 2ndQuadrant; while I can't talk
 about the details of some recent incidents, I am not speaking
 theoretically when I warn about this.

 What about putting indexes on them?  If the drive fails and drops
 writes on those, they could be rebuilt - assuming your system can
 function without the index(es) temporarily.
You could put indices on them but as noted by Scott, he's SPOT ON.

ANY disk that says write is complete when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
before you have a failure of some sort that results in catastrophic data
loss.  If you're LUCKY the database won't start and you know you're in
trouble.  If you're UNLUCKY the database DOES start but there is
undetected and unrecoverable data corruption somewhere inside the data
tables, which you WILL discover at the most-inopportune moment (like
when you desperately NEED that business record for some reason.)  You
cannot put either the tables or the logs on such a drive without running
the risk of a data corruption problem that WILL lose data and MAY be
catastrophic, depending on exactly what fails when.

While it is possible to recover that which is not damaged from a
database that has corruption like this it simply is not possible to
recover data that never made it to the disk - no matter what you do -
and the time and effort expended (not to mention money if you have to
bring in someone with specialized skills you do not possess) that result
from such decisions when things go wrong are extreme.

Don't do it.

-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Scott Marlowe
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger k...@denninger.net wrote:

 ANY disk that says write is complete when it really is not is entirely
 unsuitable for ANY real database use.  It is simply a matter of time

What about read only slaves where there's a master with 100+spinning
hard drives getting it right and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Greg Smith

Brad Nicholson wrote:
What about putting indexes on them?  If the drive fails and drops 
writes on those, they could be rebuilt - assuming your system can 
function without the index(es) temporarily.


Dumping indexes on SSD is one of the better uses for them, presuming you 
can survive what is likely to be an outage from a can the site handle 
full load? perspective while they rebuild after a crash.  As I'm sure 
Brad is painfully aware of already, index rebuilding in PostgreSQL can 
take a while.  To spin my broken record here again, the main thing to 
note when you consider that--relocate indexes onto SSD--is that the ones 
you are most concerned about the performance of were likely to be 
already sitting in RAM anyway, meaning the SSD speedup doesn't help 
reads much.  So the giant performance boost just isn't there in that case.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Karl Denninger
Scott Marlowe wrote:
 On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger k...@denninger.net wrote:
   
 ANY disk that says write is complete when it really is not is entirely
 unsuitable for ANY real database use.  It is simply a matter of time
 

 What about read only slaves where there's a master with 100+spinning
 hard drives getting it right and you need a half dozen or so read
 slaves?  I can imagine that being ok, as long as you don't restart a
 server after a crash without checking on it.
   
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

IF you mean a server that only accepts SELECTs, does not accept UPDATEs
or INSERTs, and on a crash **reloads the entire database from the
master**, then ok.

Most people who will do this won't reload it after a crash.  They'll
inspect the database and say ok, and put it back online.  Bad Karma
will ensue in the future.

Incidentally, that risk is not theoretical either (I know about this one
from hard experience.  Fortunately the master was still ok and I was
able to force a full-table copy I didn't like it as the database was
a few hundred GB, but I had no choice.)

-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson

 On 8/10/2010 2:28 PM, Greg Smith wrote:

Brad Nicholson wrote:
What about putting indexes on them?  If the drive fails and drops 
writes on those, they could be rebuilt - assuming your system can 
function without the index(es) temporarily.


Dumping indexes on SSD is one of the better uses for them, presuming 
you can survive what is likely to be an outage from a can the site 
handle full load? perspective while they rebuild after a crash.  As 
I'm sure Brad is painfully aware of already, index rebuilding in 
PostgreSQL can take a while.  To spin my broken record here again, the 
main thing to note when you consider that--relocate indexes onto 
SSD--is that the ones you are most concerned about the performance of 
were likely to be already sitting in RAM anyway, meaning the SSD 
speedup doesn't help reads much.  So the giant performance boost just 
isn't there in that case.


The case where I'm thinking they may be of use is for indexes you can 
afford to lose.  I'm thinking of ones that are needed by nightly batch 
jobs, down stream systems or reporting - the sorts of things that you 
can turn off  during a rebuild, and where the data sets are not likely 
to be in cache.


We have a few such cases, but we don't need the speed of SSD's for them.

Personally, I wouldn't entertain any SSD with a capacitor backing it for 
anything, even indexes.  Not worth the hassle to me.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson

 On 8/10/2010 2:38 PM, Karl Denninger wrote:

Scott Marlowe wrote:

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net  wrote:
   

ANY disk that says write is complete when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
 


What about read only slaves where there's a master with 100+spinning
hard drives getting it right and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.
   

A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?



A valid case is a Slony replica if used for query offloading (not for 
DR).  It's considered a read-only subscriber from the perspective of 
Slony as only Slony can modify the data  (although you are technically 
correct, it is not read only - controlled write may be more accurate).


 In case of failure, a rebuild + resubscribe gets you back to the same 
consistency.  If you have high IO requirements, and don't have the 
budget to rack up extra disk arrays to meet them, it could be an option.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Karl Denninger
Brad Nicholson wrote:
 On 8/10/2010 2:38 PM, Karl Denninger wrote:
 Scott Marlowe wrote:
 On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger k...@denninger.net wrote:
   
 ANY disk that says write is complete when it really is not is entirely
 unsuitable for ANY real database use.  It is simply a matter of time
 

 What about read only slaves where there's a master with 100+spinning
 hard drives getting it right and you need a half dozen or so read
 slaves?  I can imagine that being ok, as long as you don't restart a
 server after a crash without checking on it.
   
 A read-only slave isn't read-only, is it?

 I mean, c'mon - how does the data get there?

 A valid case is a Slony replica if used for query offloading (not for
 DR).  It's considered a read-only subscriber from the perspective of
 Slony as only Slony can modify the data  (although you are technically
 correct, it is not read only - controlled write may be more accurate). 

  In case of failure, a rebuild + resubscribe gets you back to the same
 consistency.  If you have high IO requirements, and don't have the
 budget to rack up extra disk arrays to meet them, it could be an option.
CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to stable
storage.

When the slave restarts it will not know that the transaction was lost. 
Neither will the master, since it was told that it was committed.  Slony
will happily go on its way and replicate forward, without any indication
of a problem - except that on the slave, there are one or more
transactions that are **missing**.

Some time later you issue an update that goes to the slave, but the
change previously lost causes the slave commit to violate referential
integrity.   SLONY will fail to propagate that change and all behind it
- it effectively locks at that point in time.

You can recover from this by dropping the slave from replication and
re-inserting it, but that forces a full-table copy of everything in the
replication set.  The bad news is that the queries to the slave in
question may have been returning erroneous data for some unknown period
of time prior to the lockup in replication (which hopefully you detect
reasonably quickly - you ARE watching SLONY queue depth with some
automated process, right?)

I can both cause this in the lab and have had it happen in the field. 
It's a nasty little problem that bit me on a series of disks that
claimed to have write caching off, but in fact did not.  I was very
happy that the data on the master was good at that point, as if I had
needed to failover to the slave (thinking it was a good copy) I would
have been in SERIOUS trouble.

-- Karl
attachment: karl.vcf
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Brad Nicholson

 On 8/10/2010 3:28 PM, Karl Denninger wrote:

Brad Nicholson wrote:

On 8/10/2010 2:38 PM, Karl Denninger wrote:

Scott Marlowe wrote:

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net  wrote:
   

ANY disk that says write is complete when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
 


What about read only slaves where there's a master with 100+spinning
hard drives getting it right and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.
   

A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

A valid case is a Slony replica if used for query offloading (not for 
DR).  It's considered a read-only subscriber from the perspective of 
Slony as only Slony can modify the data  (although you are 
technically correct, it is not read only - controlled write may be 
more accurate).


 In case of failure, a rebuild + resubscribe gets you back to the 
same consistency.  If you have high IO requirements, and don't have 
the budget to rack up extra disk arrays to meet them, it could be an 
option.

CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to 
stable storage.


What will hose you is assuming that your data will be okay in the case 
of a failure, which is a very bad assumption to make in the case on 
unreliable SSD's.  You are assuming I am implying that these should be 
treated like reliable media - I am not.


In case of failure, you need to assume data loss until proven 
otherwise.  If there is a problem, rebuild.


When the slave restarts it will not know that the transaction was 
lost.  Neither will the master, since it was told that it was 
committed.  Slony will happily go on its way and replicate forward, 
without any indication of a problem - except that on the slave, there 
are one or more transactions that are **missing**.




Correct.
Some time later you issue an update that goes to the slave, but the 
change previously lost causes the slave commit to violate referential 
integrity.   SLONY will fail to propagate that change and all behind 
it - it effectively locks at that point in time.



It will lock data flow to that subscriber, but not to others.

You can recover from this by dropping the slave from replication and 
re-inserting it, but that forces a full-table copy of everything in 
the replication set.  The bad news is that the queries to the slave in 
question may have been returning erroneous data for some unknown 
period of time prior to the lockup in replication (which hopefully you 
detect reasonably quickly - you ARE watching SLONY queue depth with 
some automated process, right?)


There are ways around that - run two subscribers and redirect your 
queries on failure.  Don't bring up the failed replica until it is 
verified or rebuilt.


I can both cause this in the lab and have had it happen in the field.  
It's a nasty little problem that bit me on a series of disks that 
claimed to have write caching off, but in fact did not.  I was very 
happy that the data on the master was good at that point, as if I had 
needed to failover to the slave (thinking it was a good copy) I 
would have been in SERIOUS trouble.


It's very easy to cause those sorts of problems.

What  I am saying is that the technology can have a use, if you are 
aware of the sharp edges, and can both work around them and live with 
them.  Everything you are citing is correct, but is more implying that 
they they are blindly thrown in without understanding the risks and 
mitigating them.


I'm also not suggesting that this is a configuration I would endorse, 
but it could potentially save a lot of money in certain use cases.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Scott Marlowe
On Tue, Aug 10, 2010 at 12:38 PM, Karl Denninger k...@denninger.net wrote:
 Scott Marlowe wrote:

 On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger k...@denninger.net wrote:


 ANY disk that says write is complete when it really is not is entirely
 unsuitable for ANY real database use.  It is simply a matter of time


 What about read only slaves where there's a master with 100+spinning
 hard drives getting it right and you need a half dozen or so read
 slaves?  I can imagine that being ok, as long as you don't restart a
 server after a crash without checking on it.


 A read-only slave isn't read-only, is it?

 I mean, c'mon - how does the data get there?

Well, duh.  However, what I'm looking at is having two big servers in
failover running on solid reliable hardware, and then a small army of
read only slony slaves that are used for things like sending user rss
feeds and creating weekly reports and such.  These 1U machines with 12
to 24 cores and a single SSD drive are disposable in terms that if
they ever crash, there's a simple script to run that reinits the db
and then subscribes them to the set.

My point being, no matter how terrible an idea a certain storage media
is, there's always a use case for it.  Even if it's very narrow.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Christopher Browne
On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 My point being, no matter how terrible an idea a certain storage media
 is, there's always a use case for it.  Even if it's very narrow.

The trouble is, if extra subscribers induce load on the master,
which they presumably will, then that sliver of use case may very
well get obscured by the cost, such that the sliver should be treated
as not existing :-(.
-- 
http://linuxfinances.info/info/linuxdistributions.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

2010-08-10 Thread Scott Marlowe
On Tue, Aug 10, 2010 at 2:00 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 My point being, no matter how terrible an idea a certain storage media
 is, there's always a use case for it.  Even if it's very narrow.

 The trouble is, if extra subscribers induce load on the master,
 which they presumably will, then that sliver of use case may very
 well get obscured by the cost, such that the sliver should be treated
 as not existing :-(.

One master, one slave, master handles all writes, slave handles all of
the other subscribers.  I've run a setup like this with as many as 8
or so slaves at the bottom of the pile with no problems at all.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance