In the condition that there are 3 nodes, and 2 nodes fail, and then 1
node is started, while that 1 node is executing the synchronization
recovery algorithm (for ckpt in this case), and then another node
starts, the checkpoint database can become corrupted because of an early
commit in the abort phase.

The solution is simply to remove the call to the commit handler when the
recovery phase is aborted.

Patch attached for whitetank and trunk.

Regards
-steve
Index: sync.c
===================================================================
--- sync.c	(revision 1505)
+++ sync.c	(working copy)
@@ -429,6 +429,7 @@
 			sizeof (barrier_data_confchg));
 
 		sync_callbacks_load();
+log_printf (LOG_LEVEL_NOTICE, "sync_callbacks_load\n");
 
 		/*
 		 * if sync service found, execute it
@@ -457,6 +458,7 @@
 	}
 	if (sync_processing && sync_callbacks.sync_abort != NULL) {
 		sync_callbacks.sync_abort ();
+		sync_callbacks.sync_activate = NULL;
 	}
 	/*
 	 * If no virtual synchrony filter configured, then start
_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to