On 7/22/12 2:14 PM, Gilles Sadowski wrote:
> On Sun, Jul 22, 2012 at 09:27:17AM -0700, Phil Steitz wrote:
>> On 7/21/12 6:17 AM, Gilles Sadowski wrote:
>>> Hi.
>>>
>>> My previous post (with subject "Synchronisation") made me think (again) that
>>> it might be useful to start considering how to take advantage of
>>> multi-threading in Commons Math.
>>> Indeed, it seems that some parts of the library might end up not being used
>>> anymore because their performance simply cannot match competing
>>> implementations that do benefit form parallelization. [The recent example
>>> that comes to mind is the FFT.]
>> This is an interesting question.  I am also -1 on adding
>> dependencies, but it would be a good idea to look at how others have
>> solved the problem of how to support parallel execution by multiple
>> threads without managing threads directly.  Lots of [math]
>> algorithms could be parallelized.  The question is how to
>> effectively coordinate the work without owning or creating the
>> workers.  I would be -0 to any suggestion that involved [math]
>> itself spawning threads,
> I certainly do mean that, although threads are to be managed by the
> utilities in package "java.util.concurrent".
>
>> since that 0) creates management headeaches
> If it does, then it's too complex for CM. But it shouldn't in readily
> paralellizable tasks (i.e. a processing that can be cut into independeant
> sub-tasks).

It is "easy" to spawn a lot of threads in applications.  It is not
as easy to make sure they are all cleaned up on all execution paths.
>
>> 1) may violate some container contracts 
> The usage of multiple cores would be a user setting (i.e. how many tasks can
> run in parallel).

Some container contracts forbid spawning threads, so this would have
to be able to be disabled.

>
>> and 2) forces execution
>> threads to be in the same process.
> I don't understand that.

If [math] is managing the threads, they have to all be in the same
jvm process.  If, on the other hand, we allow [math] algorithm
subtasks to be executed in parallel by other programs / frameworks
(such as, for example, Hadoop), the computation could be spread
across multiple processes or even physical hosts.

>
>>  I think it is worth thinking
>> about how we might support parallel execution by externally managed
>> workers.  An obvious thing to look at is how to break our
>> parallelizable algorithms into pieces that could be executed in
>> Hadoop Map/Reduce jobs.
> I don't know what that is.

Have a look at the Hadoop docs, or Pig.  Both are Apache projects. 
There are also other parallel execution frameworks out there.
>
>> Step 0) is the breaking up part. Then step
>> 1) might be either some examples added to the user guide or custom
>> Pig functions (or examples of how to code them).
> I don't know about that either.
>
> I was rather thinking of using the utilities readily available in the
> Java language standard e.g.:
>   http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

That is the internally-managed threads approach, which could be
done, but has the limitations mentioned above.

For either approach - managing threads internally, or letting an
external execution framework do it - the first step, as you have
mentioned above, is to identify which algorithms can be
parallelized, how to go about dividing up the work, what data needs
to be shared and how to aggregate the results.

Phil
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to