>>>>> Chris Marshall <[email protected]> writes: >>>>> On 3/8/2011 9:02 AM, Ingo Schmid wrote:
>> there are certainly two major features which impact PDL performance:
>> 1. Threading should be aware of multiple CPUs and cores
>> automagically and use them.
> This would be nice to have.
To be honest, I never had a formal class on parallel
programming, so my perception could be a bit naive. But
nevertheless still.
My understanding of this problem is that if, e. g., the
following computation is to be done:
$a = $b->plus ($c, 0)->mult ($d, 0)->sumover ();
it's the ->sumover () step that is to be considered for the
parallelization first. Unfortunately, it's, at the very same
time, the last to be actually “seen” by PDL.
For a sequence of elementwise and aggregation operations like
that, my guess would be to parallelize the computation roughly
as follows (where half () is a ->slice ()-based function that
splits a PDL instance into two halves along its most major
dimension):
{
my ($b1, $b2) = half ($b);
my ($c1, $c2) = half ($c);
my ($d1, $d2) = half ($d);
my $t1
= XXXTHREAD->spawn {
$b1->plus ($c1, 0)->mult ($d1, 0)->sumover ();
};
my $t2
= XXXTHREAD->spawn {
$b2->plus ($c2, 0)->mult ($d2, 0)->sumover ();
};
XXXTHREAD->wait ($t1, $t2);
my $a
= PDL->new ([ XXXTHREAD->value ($t1),
XXXTHREAD->value ($t2) ])->sumover ();
}
(Here, I've used that the ->sumover () over an array is the
->sumover () over the ->sumover ()'s over its subarrays. For
the other aggregate functions, it may be like or different.)
Once we have the topmost operator, and the nelem () of the PDL
instance to be passed to it, we can decide should we utilize the
parallel computation for it, by “splitting” it like shown above.
However, the only way I see for the topmost operator to be
“seen” to PDL before its arguments are actually computed is the
delayed (AKA lazy) evaluation. Which is quite common in certain
languages (and some, like Haskell, are built all around such a
notion), but which I've rarely (if at all) seen being done in
Perl. And I'm not sure that extending PDL “to do it the lazy
way” could be at all easy. (And we'd have to watch for all the
PDL instances involved in such a computation to not become
tampered along the way, as such tampering should wait for the
computation to complete first.)
Still, individual methods could probably be made
“multicore-enabled” without much effort. I'm, however, unsure
about the overall effect it may have upon the performance, etc.
[…]
--
FSF associate member #7257
pgpCD92zt2qbh.pgp
Description: PGP signature
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
