On 8/29/2010 1:45 PM, Peter Firmstone wrote:
We've got MultithreadedTC in the test/lib directory for allows for
specific interleaving of threads to test concurrency issues.
http://www.cs.umd.edu/projects/PL/multithreadedtc/overview.html
Very, very interesting. One of my projects is an attempt to wrap a
ServiceDiscoveryManager in stub versions of the classes with which it
interacts, so that I can force lots of events to happen as close as
possible to the same time. That library might make it possible to force
specific test cases.
I want to do the same to JoinManager, especially so that I can force
retries, which do not seem to be being tested.
Probably best wait till I've got the bug sorted though.
An update on progress:
It appears I've sorted the Null pointer issue and the Serialization
issue, so I'm no longer getting exceptions, however I'm still getting
test failures, in the ServiceDiscoveryManager, which appear related to
timing issues, counting events etc.
I integrated Brian's patch, but that fix isn't related to this
particular case.
The next step over the coming week, will be to go back to the stable
build and incrementally add recent changes, including those I have
locally now, so I can pinpoint problem causes.
One of the unfortunate things about timing problems is that a perfectly
valid change can unmask a latent concurrency bug. That is why I've held
off on pushing a new TaskManager while I'm unsure about the correctness
of SDM and JoinManager's use of TaskManager.
Did you want me to update my current local changes to SVN? I've been
holding off, since additional changes that haven't solved the problem,
won't make the problem any clearer to those trying to solve it.
If you have fixes for problems you do understand, I would vote for
checking them in.
Patricia