Re: A new implementation of TaskManager

Patricia Shanahan Mon, 05 Jul 2010 19:26:02 -0700

Peter Firmstone wrote:

Patricia Shanahan wrote:
Now that I've made some progress on building and testing, I'm ready toreturn to the substantive issues of the TaskManager refactoring project.
I've read the "com.sun.jini.thread lock contention" thread. I think atthis point the best strategy is to keep TaskManager, update itsinterfaces to improve performance, and use API classes available inJDK 1.5 as applicable in its implementation.
I see two objections to directly exposing ThreadPoolExecutor orsimilar to the TaskManager callers.
1. There are new features in 1.6 that may be useful, but not untilRiver moves to 1.6. I would prefer to avoid having to change all thecallers to make use of future features. For example RunnableFuture isa 1.6 interface.
Even within a ThreadPoolExecutor based implementation, I think thepending tasks can be handled most simply by not handing them over tothe executor until they are ready to execute.
Ok, that all makes sense.
2. River may need some global data collection and operations tosupport resistance to denial of service attacks. Having a central taskmanager class may make that easier, if it is ever needed.
A good idea.
I have two immediate questions:
1. Are the current tests considered adequate? If nobody has anopinion, I'll review them before starting on refactoring.
I'm unsure to be honest. I don't have access to hardware needed to testscalability.

Actually, in this question I was really trying to get at tests forfunctional correctness. I regard good functional correctness tests as anecessary prior condition for refactoring.

I can find out either by reading test code or by experiment, making somedeliberate mistakes and seeing whether the tests detect them. Forexample, if I make TaskManager occasionally ignore a dependency will thetests catch me doing it? If not, I need to write some tests that willcatch that.

2. What areas of TaskManager performance most need improvement? Whatare my objectives? The current implementation seems to me to be likelyto work quite fast for lightly loaded instances with few tasks, andeven fewer pending tasks, but unlikely to scale well. Is that correct?Are there any known test cases that demonstrate performance problems?
Gregg Wonderly reported the original bottleneck, caused by TaskThreadextending Thread. Gregg uses his own fork of Jini / Apache River fordelayed unmarshalling of proxy's, I'm not sure if he'd be able to runthe tests in his environment.
Another bottle neck was SecurityManager and AccessController securitychecks, due to single thread synchronization of the originalnet.jini.security.policy.DynamicPolicyProvider.implies(ProtectionDomain,Permission), that was until now, it has since changed to an empty SPI,where the administrator has the choice of usingConcurrentDynamicPolicyProvider or DynamicPolicyProviderImpl which wasrenamed from the old DynamicPolicyProvider implementation. Sorry forthe convoluted explanation.
Since TaskManager may queue Task's unbounded, as the queue grows, thetime window required for single thread synchronization grows, to checkthat no other Task in that list should run first and if true, returningthe Task to the back of the queue, all while holding the lock. Duringthis time no TaskThreads can take another Task and no Task's can beplaced in the queue.
I guess a better solution might be to only lock the queue long enough totake a private snapshot for the Task, but then this isn't memoryefficient for large queues. Perhaps what's needed is something todelegate the priority decision to, so that locking the queue is limitedto adding and removing tasks, then we can use a ConcurrentLinkedQueue.
Perhaps:?

  1. When given to the TaskManager, tasks are registered with a
     TaskProrityManager.
  2. Take Task from queue
  3. Ask the priority manager if the task is ready.
  4. If not ready, return to the back of the queue.
  5. If ready, Execute.
  6. Execution complete, notify TaskPriorityManager.
Tasks could belong to groups, so the tasks in that group are Comparable,the Priority Manger could group the tasks with a ConcurrentHashMap,reducing the number of tasks being compared, a group could just be astring name, or an integer hash, defined by the task implementation.
The Priority Manager could add tasks to each group when en queued andremove them when complete.
ConcurrentHashMap<Group,SortedMap<Task>> taskGroups;

interface Group {
   String toString();
   boolean equals(Object o);
   int hashCode();
}

interface Task {
   Group getGroup();
   + existing method & comparable.
}


Just some thoughts, based on your comments.


Here's another idea I've been playing with in my mind.

1. Replace the Task runAfter method with:

/** Find the first Task in an Iterable's iteration order on which this
Task depends. Return null if there is no such task.
**/
public Task getDependency(Iterable<Task> candidates);

There would be two typical implementations. If implementing class has nodependency issues, it would unconditionally return null. If there aredependencies, it would be:


public Task getDependency(Iterable<Task> candidates){
  for(Task candidate : candidates){
    if( some condition involving this and candidate ){
      return candidate;
    }
  }
  return null;
}

I'm thinking of keeping all the Task instances for which the TaskManageris currently responsible in a TreeSet based on reverse arrival order.The Iterable would always be set up to search in increasing age order,so getDependency will return the youngest Task on which this depends.

Incidentally, the business with passing a size parameter along with theList in the current implementation is unnecessary. All the Collectionclasses come with good sublist or subset capabilities. ArrayList, inparticular, can return a List view of any index range.

Consider an add of a Task x. Use x.getDependency on the whole TreeSet.If it returns a hit y, put x in a Set associated with y. We then don'thave to consider x for placement in the ready set until y is removed orterminates.

When y is removed or terminates, search only tasks older than y to lookfor another dependency. There is a good chance there won't be many atthat point. If x and y belong to a serial order subset, there won't beany, because y would not have run until all older tasks had finished.

Essentially, this is the usual operating system strategy of puttingthreads that are not currently runnable in a data structure associatedwith the reason for non-runnability, outside the priority structure thatdispatches runnable threads. This minimizes the cost to the dispatcherof a thread that cannot do anything useful until some disk read finishes.

This approach has its highest cost if the TaskManager has a lot oftasks, and we are adding a Task depends on none of them but is of a typethat might depends on another Task, so it has to scan the entire Iterable.

TreeSet seems attractive because it has a good balance of fast append,scan forwards or backwards from a specified element, and removal, but Iam still thinking about data structures, and may experiment withalternatives.


What do you think about this idea?

Patricia

Re: A new implementation of TaskManager

Reply via email to