Hi,

It's about the stage where TM's recovery  process finds a in-doubt transaction 
and notifies the sys admin about it: what hooks does ISPN provide to the sys 
admin in order to "fix" the tx.
E.g. step >= 3.3 : 
http://community.jboss.org/servlet/JiveServlet/showImage/102-16552-14-11811/3_non_originator_failure.png

Here is what I have in mind:

Expose (JMX) two operations:

   //all the params together fully describe a xid.
   replayTx(byte[] txBranch, byte[] txId, int formatId); 
   forceRollbackTx(byte[] txBranch, byte[] txId, int formatId);

Here is how these two ops would work:
A. replayTx 
    1. the node has locally the PrepareCommand associated with that XID
        - re-issues a prepare: TransactionXAResource.prepare
        - if successful re-issues a commit: TransactionXAResource.commit
        -if failure happens at any step the user is informed and she/he can 
re-do the JMX call
        - if success the recovery information is removed from the cluster 
(async)
    2. the node doesn't have the PrepareCommand associated with that XID
        - broadcast ReplayTxCommand (Xid)
        - when a node receives ReplayTxCommand
                - if doesn't have a PreparedCommand associated with the Xid 
ignores it
                - if has a PreparedCommand...
                        - is it the first in the view that has it [1]? 
                                - yes. Execute A.1then returns result to node 
that broadcasted ReplayTxCommand. This is guaranteed to happen on at most[2] 
one node in the cluster
                                - no. Ignores it.
        - if success the recovery information is removed from the cluster 
(async)
B.rollbackTx
   - node broadcasts RollbackCommand
   - each node that has the PrepareCommand forces a rollback
   - each node that doesn't have the PreparedCommand ignores it
   - if success the recovery information is removed from the cluster (async)

Cheers,
Mircea

[1] this is determined by building the set of nodes on which tx spreads, based 
on tx's state. Then determine the first in the view. 
[2] it is possible not to happen on any node as the PrepareCommand might had 
been removed from all nodes in between (node failures, expiration from the 
recovery cache). 

   


  
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to