On 10/15/2010 10:51 AM, Patricia Shanahan wrote:
I seem to have reached a level of understanding of River that lets me
track down some bugs, and bug fixing is an obviously useful activity, so
I plan to spend some time on it.
As I mentioned in the "Request for testing help" thread, I've
investigated the GetStateTest hang. The test is spinning waiting for the
TransactionManager's getState method to throw an exception because it
has discarded the aborted transaction. As far as I can tell, there is no
requirement that a TransactionManager discard a transaction, even when
it is permitted to do so.
I plan to file a Jira for the test, and modify it to spin for a limited
time. Treat either UnknownTransactionException or continuous return of
ABORTED status for e.g. one minute as successful test completion.
I've investigated this some more, and the test is revealing a real
problem in the transaction abort implementation.
If there is a timeout it passes the problem to a SettlerTask, subclass
of RetryTask, which retries the abort. However,
com.sun.jini.mahalo.TxnManagerImpl's abort code checks for an attempt to
abort a transaction for which abort has already been called, and throws
new CannotAbortException("Transaction previously aborted")
The net.jini.core.transaction.server.TransactionManager interface, which
it implements, specifies that abort throws CannotAbortException for a
transaction that has reached the COMMITTED state, but says nothing about
throwing it for a transaction that is in the ABORTED state.
I propose modifying TxnManagerImpl to make it match the interface
declaration, and allow an abort to be retried. This may break other
tests, if they are assuming the behavior that TxnManagerImpl implemented.
Patricia