Status: New
Owner: ----
Labels: Component-Diameter Type-Enhancement Priority-Medium Version-1.4.0 Release-Type-FINAL Roadmap-Fix

New issue 2444 by [email protected]: LocalTimerFacilityImpl not thread safe: causes infinite loop in HashMap under high concurrency
http://code.google.com/p/mobicents/issues/detail?id=2444

What steps will reproduce the problem?
1. Use a diameter client with a number of threads (say 10) to send about 10000 requests each to a stateless interface (e.g. CxDx) on the SLEE via the diameter RA. Ensure the client and server are on multi-core machines. 2. Ensure debug logging is off in the SLEE and Diamater stack to ensure max concurrency and chances for a race condition 3. After the load test, notice how the SLEE java process will be running at 80%+ CPU utilisation. A thread dump shows that the ApplicationSession threads are in RUNNING state and all accessing java.util.HashMap. This continues forever. This occurs due to an infinite loop occurring in java.util.HashMap if gets/puts are done without adequate locking. The root cause is a non-thread safe HashMap is being updated in a threaded environment without locks.

What is the expected output? What do you see instead?
I would expect the diameter stack to not "die" under load conditions, instead, it uses all available CPU and is effectively dead until restarted.


What version of the product are you using? On what operating system?
jdiameter-impl-1.5.4.1-build415.jar on Solaris

Please provide any additional information below.

Around line 52 of LocalTimerFacilityImpl, a HashMap is defined as follows:
private HashMap<String, ScheduledFuture> idToFutureMapping;
and on line 59, its initialised as a new HashMap.

A single LocalTimerFacilityImpl is created in StackImpl and hence is utilised by any number of threads concurrently. This specific use case has multiple threads running line 102 in CxDxSession: this.timerId_timeout = super.timerFacility.schedule(sessionId, TIMER_NAME_MSG_TIMEOUT, _TX_TIMEOUT);

Multiple threads then end up modifying the HashMap concurrently causing the issue.

I have successfully changed the HashMap to a ConcurrentHashMap and the error is no longer reproducable as its now thread safe. I request that this change (or similar) be made in the trunk.

Paul



Reply via email to