On 28 July 2011 11:16, Greg Brown <gk_br...@verizon.net> wrote:
>>> I think that making TaskGroup a Group was a mistake on my part. There is no 
>>> need to ensure uniqueness here.
>
> Actually, it just occurred to me that uniqueness, while not strictly 
> required, should certainly be encouraged. If you added the same task to a 
> ParallelTaskGroup's task sequence more than once, you'd basically get 
> serialized behavior, since only one thread could run the task at a time. So 
> perhaps a Group is actually appropriate.

I think that really depends on the intent again.  I agree that in many
cases the supplied Tasks would not contain any duplicates, but is
there a strong reason to actually enforce it rather than just document
the behaviour?  Isn't the primary purpose of ParallelTaskGroup just to
submit and wait?

The documentation might simply state that it will *attempt* to run the
supplied Tasks in parallel, but that it is limited because only one
'copy' of a duplicated Task can run at any time.  I'd be happy enough
with that, and the documentation can 'encourage' users not to supply
duplicates.  I can't see duplicates *breaking* anything, but can see
them *changing* the observed behaviour.

The *attempt* part is important as even non-duplicated Tasks might
still be run in series if they were all created with a reference to
the same 'SingleThreadExecutor' ExecutorService.  This should also be
documented for clarity.


However if the intent is more to guarantee as far as possible that all
submitted tasks will be started at the same time, then insisting on
Group<Task> makes sense (would this be ever required?).  But then so
does the ability to control which ExecutorService is actually used to
execute the tasks.  The 'TaskGroup' could create its own unbounded
ExecutorService (or used one supplied to it) to actually execute each
of the tasks, overriding any ExecutorServices supplied to the Tasks'
constructors.


> Further, it occurred to me that TaskSequence (or SerialTaskGroup) is not 
> really all that useful, since it effectively executes the tasks 
> synchronously. The main value of a Task (or more generally, a thread) is 
> concurrency. Any old procedural code can run tasks in series.

Yeah, that is why I questioned its value.  Iterating over a Sequence
and listening for some
listeners is not overly complex.


> Worse, TaskSequence executes sub-tasks by creating (or at least obtaining) a 
> new thread per task, which isn't really necessary. The same results could be 
> achieved by simply calling the synchronous version of execute() (the one that 
> doesn't take a listener argument) on each task in the list.

This was part of the cause of my confusion with these classes.  I
didn't see why they needed to be Tasks themselves.  It is
straightforward to create and execute new Task if and when I want to,
but I can't extract the executable portion of code from a Task once it
is created.  If the parallel execution code is wrapped in a Task, then
that decision has been made for me.


When I first read the original javadoc comment ...
"Class that runs a group of tasks in parallel and notifies listeners
when all tasks are complete."
I thought it might be a fancy TaskListener implementation, or just a
simple class that iterated over the collection of Tasks, launching
them and waiting for completion using a listener of its own creation
(which is what TaskGroup actually does do) but I didn't think it would
extend Task.


Do you remember if it was a conscious choice for these classes to
extend Task from the outset, or do they do so because they use
TaskListener for notifications?

Making use of the TaskListener interface that is closely associated
with Task is understandable, rather than creating an entirely new
listener class to serve much the same purpose, but it wouldn't be out
of the question to do so.  If there was a dedicated TaskGroupListener
interface, then it would break the connection to Task and there would
be no need for these classes to extend Task in the first place.
(Unless that is part of their design)


As I mentioned before, I was thinking of TaskGroup as a 'ParallelExecutor'.

Would it help to concentrate on the actual execution of the tasks and
waiting for their completion in the 'TaskGroup'/'ParallelExecutor',
rather than the implementation of a Collection too?  (This is less
relevant if you are considering deprecating TaskSequence, but serves
to illustrate the idea).

Then the functionality can be described as 'a class that submits Tasks
for parallel execution and waits until they have all completed.

The class could have a Collection<Task> property, or a static execute
method that takes a Collection<Task> as a parameter.  It would be up
to the user to control what Tasks they submit for execution.  The
'TaskGroup' / 'ParallelExecutor' would just be concerned with
executing and waiting for those tasks to complete.

Then it distills to more of a set of static utility methods
public class TaskGroup {

    // Synchronous methods

    // Submit tasks for parallel execution (using their individual,
internal ExecutorServices) and wait for completion
    public static void execute(final Collection<Task> tasks) {...}

    // Submit tasks to the provided ExecutorService for parallel
execution and wait for completion
    // This requires a Task#setExecutorService(ExecutorService) method
    public static void execute(final Collection<Task> tasks, final
ExecutorService executorService) {...}


    // Asynchronous methods

    // Cleaner (but less helpful) to make the user to wrap the
previous methods in
    // their own Task if they want to run them asynchronously, rather
than having
    // async methods here, but either is possible.

    // Creates and executes a parent Task using the
'parentTaskExecutorService' (or a default ExecutorService if null)
    // The parent Task will submit the supplied Collection of Tasks to
the 'subTaskExecutorService' for parallel execution and wait for their
completion.
    // If 'subTaskExecutorService' is null, the sub-Tasks will be
executed with their internal ExecutorServices.
    // The TaskListener will be called when the parent Task completes.
    // See execute(Collection<Task>, ExecutorService)
    public static void execute(final TaskListener<V>
parentTaskListener, final ExecutorService parentTaskExecutorService,
final Collection<Task> subTasks, final ExecutorService
subTaskExecutorService) {...}

    // More variations...
}

(These could even be named less generically and pushed into Task itself.)


That would allow code like this to work as I had hoped/expected

// Running synchronously, but could be wrapped in a Task if required
// No more than 3 Tasks should be running at any time
TaskGroup.execute(manyTasks, Executors.newFixedThreadPool(3));

// Running asynchronously using a Task created by the utility method
// No more than 3 Tasks should be running at any time
TaskGroup.execute(completionListener, null, manyTasks,
Executors.newFixedThreadPool(3));


(I just typed all of this as it came to me, so I might well have
overlooked some reason why this approach would not work)


> So, I'm actually thinking that:
>
> a) TaskSequence should be deprecated/removed
+1

> b) TaskGroup is actually an appropriate name  :-)
-0 :)


If changes will be made as a result of this thread, then I think Task
should have a setExecutorService(ExecutorService) method too, or its
'executorService' field made final if it is not supposed to be
changed.  My preference would be the former.


Yesterday I changed TaskGroup's javadoc comment from
"Class that runs a group of tasks in parallel and notifies listeners
when all tasks are complete."

to
"Task that runs a group of tasks in parallel and notifies listeners
when all tasks are complete."

but it could go a little further again (and capitalize the class names)
"Task that attempts to execute a Group of Tasks in parallel.  This
Task completes when all sub-Tasks have completed."

And then include some detail about possible side effects from the
ExecutorServices of the sub tasks.
(Obviously this all depends on how the classes evolve.)

Chris

Reply via email to