Re: [GENERAL] Dangers of fsync = off
Thanks, Bill and Scott, for your responses. To summarize, turning fsync off on the master of a Slony-I cluster is probably safe if you observe the following: 1. When failover occurs, drop all databases on the failed machine and sync it with the new master before re-introducing it into the cluster. Note that the failed machine must not be returned to use until this is done. 2. Be aware that the above implies that you will lose any transactions which did not reach the standby machine prior to failure, violating the Durability component of ACID. This is true of any system which relies on asynchronous replication and automatic failover. - Joel ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Dangers of fsync = off
Thanks for your response, Andrew. On Tue, 8 May 2007, Andrew Sullivan wrote: On Fri, May 04, 2007 at 08:54:10AM -0600, Joel Dice wrote: My next question is this: what are the dangers of turning fsync off in the context of a high-availablilty cluster using asynchronous replication? My real question is why you want to turn it off. If you're using a battery-backed cache on your disk controller, then fsync ought to be pretty close to free. Are you sure that turning it off will deliver the benefit you think it will? You may very well be right. I tend to think in terms of software solutions, but a hardware solution may be most appropriate here. In any case, I'm not at all sure this will bring a significant peformance improvement. I just want to understand the implications before I start fiddling; if fsync=off is dangerous, it doesn't matter what the performance benefits may be. on Y. Thus, database corruption on X is irrelevant since our first step is to drop them. Not if the corruption introduces problems for replication, which is indeed possible. That's exactly what I want to understand. How, exactly, is this possible? If the danger of fsync is that it may leave the on-disk state of the database in an inconsistent state after a crash, it would not seem to have any implications for activity occurring prior to the crash. In particular, a trigger-based replication system would seem to be immune. In other words, while there may be ways the master could cause corruption on the slave, I don't see how they could be related to the fsync setting. - Joel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Dangers of fsync = off
Thanks for the explanation, Tom. I understand the problem now. My next question is this: what are the dangers of turning fsync off in the context of a high-availablilty cluster using asynchronous replication? In particular, we are using Slony-I and linux-ha to provide a two-node, master-slave cluster. As you may know, Slony-I uses triggers to provide asynchronous replication. If the master (X) fails, the slave (Y) becomes active. At this point, the administrator manually performs a recovery by reintroducing X so that Y is the master and X is the slave. This task involves dropping any databases on X and having it sync with the versions on Y. Thus, database corruption on X is irrelevant since our first step is to drop them. It would seem that our only exposure is that both machines fail before the administrator is able to perform the recovery. Even that could be solved by leaving fsync turned on for the slave, so that when failover occurs and the slave becomes active, we only turn fsync off once we've safely reintroduced the other machine (which, in turn will have fsync turned on). There was a discussion about this here: http://gborg.postgresql.org/pipermail/slony1-general/2005-March/001760.html However, that discussion seems to assume that the administrator needs to salvage the databases on the failed machine, which is not necessary in our case. In short, is there any danger (besides losing a few transactions) of turning fsync off on the master of a cluster using asynchronous replication, assuming we don't need to recover the data from the master when it fails? Thanks. - Joel ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Dangers of fsync = off
Hello all. It's clear from the documentation for the fsync configuration option that turning it off may lead to unrecoverable data corruption. I'd like to learn more about why this is possible and how likely it really is. A quick look at xlog.h reveals that each record in the transaction log contains a CRC checksum, a transaction ID, a length, etc.. Assuming the worst thing that can happen due to a crash is that the end of the log is filled with random garbage, there seems to be little danger that the recovery process will misinterpret any of that garbage as a valid transaction record, complete with matching checksum. If my assumption is incorrect (i.e. garbage at the end of the log is not the worst that can happen), what else might happen, and how would this lead to unrecoverable corruption? Also, are there any filesystems available which avoid such cases? Sorry if this has been discussed before - in which case please point me to that discussion. Thanks. - Joel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match