On Jul 6, 2010, at 6:03 PM, Patricia Shanahan wrote: > Gregg Wonderly wrote: >> Patricia Shanahan wrote: > ... >>> If one of the River developers has an intranet test environment it may be >>> possible to simulate the effect of running over the Internet by a similar >>> trick. Create some workload that keeps the network very busy, and run it in >>> parallel with a quality assurance test. >>> >>> In some cases it may not matter which of two transactions is done first, >>> but it is important to make sure there is a consistent order between them. >> More recently, one of my most favorite test environments is to bring up open >> solaris on an i7 processor based machine with some reasonable amount of >> memory (8GB or more) and then put 8 or more instances of linux on it all >> running the same build, and then test there with appropriate loading. >> You'll get latency injection because of machine resource contention, but >> you'll also get 8, independent OS and Java VM layers that will be readily >> able to provide just about any unexplainable behavior you need to test with >> :-) > > Sounds nice and chaotic. When I have a new TaskManager and related changes > working on my system, I'll ask you to take it for a spin. > > One problem I don't think that would reproduce is the ambiguity between a > transaction taking a very long time because of load, and a transaction that > is not going to complete because a server that was working on it has crashed. > That issue always gives me headaches.
I do deal with this issue as well. It would be nice if there were a more "instant" way for transaction participants to be indicated as "lost" to cancel hung transactions more readily. I have an application that has more than 20 participants on 6 servers and if one of those doesn't want to play, it can take a while to discover just who the problem participant is for debugging etc. Gregg Wonderly
