On 18 Mar 2011, at 12:13, Mircea Markus wrote: > Hi, > > It's about the stage where TM's recovery process finds a in-doubt > transaction and notifies the sys admin about it: what hooks does ISPN provide > to the sys admin in order to "fix" the tx. > E.g. step >= 3.3 : > http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png > > Here is what I have in mind: > > Expose (JMX) two operations: > > //all the params together fully describe a xid. > replayTx(byte[] txBranch, byte[] txId, int formatId); > forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);
You expect a sysadmin to type a byte array into a JMX console? :-) You might get death threats from sysadmins... > Here is how these two ops would work: > A. replayTx > 1. the node has locally the PrepareCommand associated with that XID > - re-issues a prepare: TransactionXAResource.prepare > - if successful re-issues a commit: TransactionXAResource.commit > -if failure happens at any step the user is informed and she/he can > re-do the JMX call > - if success the recovery information is removed from the cluster > (async) > 2. the node doesn't have the PrepareCommand associated with that XID > - broadcast ReplayTxCommand (Xid) > - when a node receives ReplayTxCommand > - if doesn't have a PreparedCommand associated with the Xid > ignores it > - if has a PreparedCommand... > - is it the first in the view that has it [1]? How does a node know the answer to this question? Is the list of nodes that holds the prepare replay info stored on the PrepareCommand? > - yes. Execute A.1then returns result to node > that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] > one node in the cluster > - no. Ignores it. > - if success the recovery information is removed from the cluster > (async) > B.rollbackTx > - node broadcasts RollbackCommand > - each node that has the PrepareCommand forces a rollback > - each node that doesn't have the PreparedCommand ignores it > - if success the recovery information is removed from the cluster (async) > > Cheers, > Mircea > > [1] this is determined by building the set of nodes on which tx spreads, > based on tx's state. Then determine the first in the view. > [2] it is possible not to happen on any node as the PrepareCommand might had > been removed from all nodes in between (node failures, expiration from the > recovery cache). > > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani [email protected] twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org
_______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
