Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

Brad Nicholson Tue, 10 Aug 2010 12:51:56 -0700

 On 8/10/2010 3:28 PM, Karl Denninger wrote:

Brad Nicholson wrote:
On 8/10/2010 2:38 PM, Karl Denninger wrote:
Scott Marlowe wrote:
On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger<[email protected]>  wrote:
ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use.  It is simply a matter of time
What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves?  I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?
A valid case is a Slony replica if used for query offloading (not forDR). It's considered a read-only subscriber from the perspective ofSlony as only Slony can modify the data (although you aretechnically correct, it is not read only - controlled write may bemore accurate).
In case of failure, a rebuild + resubscribe gets you back to thesame consistency. If you have high IO requirements, and don't havethe budget to rack up extra disk arrays to meet them, it could be anoption.
CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data tostable storage.

What will hose you is assuming that your data will be okay in the caseof a failure, which is a very bad assumption to make in the case onunreliable SSD's. You are assuming I am implying that these should betreated like reliable media - I am not.

In case of failure, you need to assume data loss until provenotherwise. If there is a problem, rebuild.

When the slave restarts it will not know that the transaction waslost. Neither will the master, since it was told that it wascommitted. Slony will happily go on its way and replicate forward,without any indication of a problem - except that on the slave, thereare one or more transactions that are **missing**.


Correct.

Some time later you issue an update that goes to the slave, but thechange previously lost causes the slave commit to violate referentialintegrity. SLONY will fail to propagate that change and all behindit - it effectively locks at that point in time.

It will lock data flow to that subscriber, but not to others.

You can recover from this by dropping the slave from replication andre-inserting it, but that forces a full-table copy of everything inthe replication set. The bad news is that the queries to the slave inquestion may have been returning erroneous data for some unknownperiod of time prior to the lockup in replication (which hopefully youdetect reasonably quickly - you ARE watching SLONY queue depth withsome automated process, right?)

There are ways around that - run two subscribers and redirect yourqueries on failure. Don't bring up the failed replica until it isverified or rebuilt.

I can both cause this in the lab and have had it happen in the field.It's a nasty little problem that bit me on a series of disks thatclaimed to have write caching off, but in fact did not. I was veryhappy that the data on the master was good at that point, as if I hadneeded to failover to the slave (thinking it was a "good" copy) Iwould have been in SERIOUS trouble.


It's very easy to cause those sorts of problems.

What I am saying is that the technology can have a use, if you areaware of the sharp edges, and can both work around them and live withthem. Everything you are citing is correct, but is more implying thatthey they are blindly thrown in without understanding the risks andmitigating them.

I'm also not suggesting that this is a configuration I would endorse,but it could potentially save a lot of money in certain use cases.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.

Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

Reply via email to