On Jul 6, 2010, at 6:03 PM, Patricia Shanahan wrote:

> Gregg Wonderly wrote:
>> Patricia Shanahan wrote:
> ...
>>> If one of the River developers has an intranet test environment it may be 
>>> possible to simulate the effect of running over the Internet by a similar 
>>> trick. Create some workload that keeps the network very busy, and run it in 
>>> parallel with a quality assurance test.
>>> 
>>> In some cases it may not matter which of two transactions is done first, 
>>> but it is important to make sure there is a consistent order between them.
>> More recently, one of my most favorite test environments is to bring up open 
>> solaris on an i7 processor based machine with some reasonable amount of 
>> memory (8GB or more) and then put 8 or more instances of linux on it all 
>> running the same build, and then test there with appropriate loading.  
>> You'll get latency injection because of machine resource contention, but 
>> you'll also get 8, independent OS and Java VM layers that will be readily 
>> able to provide just about any unexplainable behavior you need to test with 
>> :-)
> 
> Sounds nice and chaotic. When I have a new TaskManager and related changes 
> working on my system, I'll ask you to take it for a spin.
> 
> One problem I don't think that would reproduce is the ambiguity between a 
> transaction taking a very long time because of load, and a transaction that 
> is not going to complete because a server that was working on it has crashed. 
> That issue always gives me headaches.

I do deal with this issue as well.  It would be nice if there were a more 
"instant" way for transaction participants to be indicated as "lost" to cancel 
hung transactions more readily.

I have an application that has more than 20 participants on 6 servers and if 
one of those doesn't want to play, it can take a while to discover just who the 
problem participant is for debugging etc.

Gregg Wonderly

Reply via email to