Status: New
Owner: ----
Labels: Component-Diameter Type-Enhancement Priority-Medium Version-1.4.0
Release-Type-FINAL Roadmap-Fix
New issue 2444 by [email protected]: LocalTimerFacilityImpl not
thread safe: causes infinite loop in HashMap under high concurrency
http://code.google.com/p/mobicents/issues/detail?id=2444
What steps will reproduce the problem?
1. Use a diameter client with a number of threads (say 10) to send about
10000 requests each to a stateless interface (e.g. CxDx) on the SLEE via
the diameter RA. Ensure the client and server are on multi-core machines.
2. Ensure debug logging is off in the SLEE and Diamater stack to ensure max
concurrency and chances for a race condition
3. After the load test, notice how the SLEE java process will be running at
80%+ CPU utilisation. A thread dump shows that the ApplicationSession
threads are in RUNNING state and all accessing java.util.HashMap. This
continues forever. This occurs due to an infinite loop occurring in
java.util.HashMap if gets/puts are done without adequate locking. The root
cause is a non-thread safe HashMap is being updated in a threaded
environment without locks.
What is the expected output? What do you see instead?
I would expect the diameter stack to not "die" under load conditions,
instead, it uses all available CPU and is effectively dead until restarted.
What version of the product are you using? On what operating system?
jdiameter-impl-1.5.4.1-build415.jar on Solaris
Please provide any additional information below.
Around line 52 of LocalTimerFacilityImpl, a HashMap is defined as follows:
private HashMap<String, ScheduledFuture> idToFutureMapping;
and on line 59, its initialised as a new HashMap.
A single LocalTimerFacilityImpl is created in StackImpl and hence is
utilised by any number of threads concurrently. This specific use case has
multiple threads running line 102 in CxDxSession:
this.timerId_timeout = super.timerFacility.schedule(sessionId,
TIMER_NAME_MSG_TIMEOUT, _TX_TIMEOUT);
Multiple threads then end up modifying the HashMap concurrently causing the
issue.
I have successfully changed the HashMap to a ConcurrentHashMap and the
error is no longer reproducable as its now thread safe. I request that this
change (or similar) be made in the trunk.
Paul