I've confirmed the temporary fix discussed below with the trunk code,
having reverted the rest of the changes I made to simplify the problem
and extract more data. I'm now rerunning the javaspaces category,
because I think several failures have the same general flavor and may be
fixed by this one change.
Patricia
On 10/26/2010 1:22 PM, Patricia Shanahan wrote:
On 10/26/2010 8:06 AM, Patricia Shanahan wrote:
Greg Trasuk wrote:
The transaction spec says (under "Joining a Transaction") that "The join
method throws CannotJoinException if the transaction is known to the
manager but is no longer active."
So it would appear that this behaviour is a bug, if the transaction's
lease has actually expired.
I had a quick look through TxnManagerImpl and I don't see any
"auto-renew-on-join" behaviour, so that seems kind of odd. In fact, in
the join(...) method, it finds a TxnManagerTransaction instance, then
calls join(..) on it, which appears to check whether the transaction is
expired, and should throw CannotJoinException if it is expired.
There is an alternative I need to check. I want see if the JavaSpace is
caching the transaction information. Maybe it isn't even checking with
the TransactionManager if it thinks it has already joined the
transaction, regardless of the length of the transaction's lease.
My guess was correct! (This is the first time I've found a River bug
through an intuitive leap, rather than step-by-step brute force debug.)
The method enterTxn in OutriggerServerImpl does the dirty work. It uses
a table that maps TransactionManager-id pairs to its internal
transaction representation, com.sun.jini.outrigger.Txn.
It begins by doing a look-up on the table which will either get a Txn or
null. That is followed by this comment:
"If we find the txn with the probe, we're all done. Just return.
Otherwise, we need to join the transaction. Of course, there is a race
condition here. While we are attempting to join, others might be
attempting to join also. We are careful to ensure that only one of the
racing threads is able to update our internal data structures with our
internal representation of this transaction. NB: There are better ways
of doing this which could ensure we never make an unnecessary remote
call that we may want to explore later."
if (txn == null) {
final TransactionManager mgr = (TransactionManager)
transactionManagerPreparer.prepareProxy(tr.mgr);
tr = new ServerTransaction(mgr, tr.id);
tr.join(participantProxy, crashCount);
txn = txnTable.put(tr);
}
Commenting out the "if (txn == null)" test, so that the join is done
regardless, makes the test pass.
My next problem is how to really fix this. Presumably, the caching was
not done for fun. There is at least some chance that it is a fix for a
real performance problem, and merely commenting out the test would bring
back that problem. On the other hand, I'm not sure of all the
consequences, including effects on recovery, of going on blindly using a
transaction id after the transaction has really been discarded by its
manager.
One approach would be to keep the caching, but try to make it correct.
Is there any way to find out the scheduled lease termination of an
existing transaction? If there were, OutriggerSeverImpl could store it
in the Txn object, and re-do the join if time now is greater.
On the other hand, I rather feel that if the state of a transaction can
be safely cached, with a lease-based time limit, it should be done by
the TransactionManager's proxy. It is more tightly coupled to the
TransactionManager implementation, and any performance benefit would
apply to all TransactionManager uses, not just the outrigger JavaSpace
implementation.
Thoughts? Suggestions?
Patricia