Re: [GENERAL] Dangers of fsync = off

2007-05-10 Thread Joel Dice

Thanks, Bill and Scott, for your responses.

To summarize, turning fsync off on the master of a Slony-I cluster is 
probably safe if you observe the following:


  1. When failover occurs, drop all databases on the failed machine and 
sync it with the new master before re-introducing it into the cluster. 
Note that the failed machine must not be returned to use until this is 
done.


  2. Be aware that the above implies that you will lose any transactions 
which did not reach the standby machine prior to failure, violating the 
Durability component of ACID.  This is true of any system which relies on 
asynchronous replication and automatic failover.


 - Joel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Dangers of fsync = off

2007-05-09 Thread Joel Dice

Thanks for your response, Andrew.

On Tue, 8 May 2007, Andrew Sullivan wrote:


On Fri, May 04, 2007 at 08:54:10AM -0600, Joel Dice wrote:


My next question is this: what are the dangers of turning fsync off in the
context of a high-availablilty cluster using asynchronous replication?


My real question is why you want to turn it off.  If you're using a
battery-backed cache on your disk controller, then fsync ought to be
pretty close to free.  Are you sure that turning it off will deliver
the benefit you think it will?


You may very well be right.  I tend to think in terms of software 
solutions, but a hardware solution may be most appropriate here.  In any 
case, I'm not at all sure this will bring a significant peformance 
improvement.  I just want to understand the implications before I start 
fiddling; if fsync=off is dangerous, it doesn't matter what the 
performance benefits may be.



on Y.  Thus, database corruption on X is irrelevant since our first step
is to drop them.


Not if the corruption introduces problems for replication, which is
indeed possible.


That's exactly what I want to understand.  How, exactly, is this possible? 
If the danger of fsync is that it may leave the on-disk state of the 
database in an inconsistent state after a crash, it would not seem to have 
any implications for activity occurring prior to the crash.  In 
particular, a trigger-based replication system would seem to be immune.


In other words, while there may be ways the master could cause corruption 
on the slave, I don't see how they could be related to the fsync setting.


 - Joel

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Dangers of fsync = off

2007-05-07 Thread Joel Dice

Thanks for the explanation, Tom.  I understand the problem now.

My next question is this: what are the dangers of turning fsync off in the 
context of a high-availablilty cluster using asynchronous replication?


In particular, we are using Slony-I and linux-ha to provide a two-node, 
master-slave cluster.  As you may know, Slony-I uses triggers to provide 
asynchronous replication.  If the master (X) fails, the slave (Y) becomes 
active.  At this point, the administrator manually performs a recovery by 
reintroducing X so that Y is the master and X is the slave.  This task 
involves dropping any databases on X and having it sync with the versions 
on Y.  Thus, database corruption on X is irrelevant since our first step 
is to drop them.


It would seem that our only exposure is that both machines fail before the 
administrator is able to perform the recovery.  Even that could be solved 
by leaving fsync turned on for the slave, so that when failover occurs and 
the slave becomes active, we only turn fsync off once we've safely 
reintroduced the other machine (which, in turn will have fsync turned on).


There was a discussion about this here:

  http://gborg.postgresql.org/pipermail/slony1-general/2005-March/001760.html

However, that discussion seems to assume that the administrator needs to 
salvage the databases on the failed machine, which is not necessary in 
our case.


In short, is there any danger (besides losing a few transactions) of 
turning fsync off on the master of a cluster using asynchronous 
replication, assuming we don't need to recover the data from the master 
when it fails?


Thanks.

 - Joel

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] Dangers of fsync = off

2007-05-03 Thread Joel Dice

Hello all.

It's clear from the documentation for the fsync configuration option that 
turning it off may lead to unrecoverable data corruption.  I'd like to 
learn more about why this is possible and how likely it really is.


A quick look at xlog.h reveals that each record in the transaction log 
contains a CRC checksum, a transaction ID, a length, etc..  Assuming the 
worst thing that can happen due to a crash is that the end of the log is 
filled with random garbage, there seems to be little danger that the 
recovery process will misinterpret any of that garbage as a valid 
transaction record, complete with matching checksum.


If my assumption is incorrect (i.e. garbage at the end of the log is not 
the worst that can happen), what else might happen, and how would this 
lead to unrecoverable corruption?  Also, are there any filesystems 
available which avoid such cases?


Sorry if this has been discussed before - in which case please point me to 
that discussion.


Thanks.

 - Joel

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match