On Aug 10, 2010, at 9:21 AM, Greg Smith wrote:

> Scott Carey wrote:
>> Also, the amount of data at risk in a power loss varies between 
>> drives.  For Intel's drives, its a small chunk of data ( < 256K).  For 
>> some other drives, the cache can be over 30MB of outstanding writes.
>> For some workloads this is acceptable
> 
> No, it isn't ever acceptable.  You can expect the type of data loss you 
> get when a cache fails to honor write flush calls results in 
> catastrophic database corruption.  It's not "I lost the last few 
> seconds";

I never said it was.

> it's "the database is corrupted and won't start" after a 
> crash.  

Which is sometimes acceptables.   There is NO GUARANTEE that you won't lose 
data, ever.  An increase in the likelihood is an acceptable tradeoff in some 
situations, especially when it is small.  On ANY power loss event, with or 
without battery backed caches and such, you should do a consistency check on 
the system proactively.  With less reliable hardware, that task becomes much 
more of a burden, and is much more likely to require restoring data from 
somewhere.

What is the likelihood that your RAID card fails, or that the battery that 
reported 'good health' only lasts 5 minutes and you lose data before power is 
restored?   What is the likelihood of human error?
Not that far off from the likelihood of power failure in a datacenter with 
redundant power.  One MUST have a DR plan.  Never assume that your perfect 
hardware won't fail.

> This is why we pound on this topic on this list.  A SSD that 
> fails to honor flush requests is completely worthless for anything other 
> than toy databases.  

Overblown.  Not every DB and use case is a financial application or business 
critical app.   Many are not toys at all.  Slave, read only DB's (or simply 
subset tablespaces) ...

Indexes. (per application, schema)
Tables. (per application, schema)
System tables / indexes.
WAL.

Each has different reliability requirement and consequences from losing 
recently written data.  less than 8K can be fatal to the WAL, or table data.   
Corrupting some tablespaces is not a big deal.  Corrupting others is 
catastrophic.  The problem with the assertion that this hardware is worthless 
is that it implies that every user, every use case, is at the far end of the 
reliability requirement spectrum.

Yes, that can be a critical requirement for many, perhaps most, DB's.  But 
there are many uses for slightly unsafe storage systems.

> You can expect significant work to recover any 
> portion of your data after the first unexpected power loss under heavy 
> write load in this environment, during which you're down.  We do 
> database corruption recovery at 2ndQuadrant; while I can't talk about 
> the details of some recent incidents, I am not speaking theoretically 
> when I warn about this.

I've done the single-user mode recover system tables by hand thing myself at 
4AM, on a system with battery backed RAID 10, redundant power, etc.   Raid 
cards die, and 10TB recovery times from backup are long.

Its a game of balancing your data loss tolerance with the likelihood of power 
failure.  Both of these variables are highly variable, and not just with 'toy' 
dbs.  If you know what you are doing, you can use 'fast but not completely 
safe' storage for many things safely.  Chance of loss is NEVER zero, do not 
assume that 'good' hardware is flawless.

Imagine a common internet case where synchronous_commit=false is fine.  
Recovery from backups is a pain (but a daily snapshot is taken of the important 
tables, and weekly for easily recoverable other stuff).   If you expect one 
power related failure every 2 years, it might be perfectly reasonable to use 
'unsafe' SSD's in order to support high transaction load on the risk that that 
once every 2 year downtime is 12 hours long instead of 30 minutes, and includes 
losing up to a day's information.   Applications like this exist all over the 
place.


> -- 
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to