>>>>> Chris Marshall <[email protected]> writes:
>>>>> On 3/8/2011 9:02 AM, Ingo Schmid wrote:

 >> there are certainly two major features which impact PDL performance:

 >> 1. Threading should be aware of multiple CPUs and cores
 >> automagically and use them.

 > This would be nice to have.

        To be honest, I never had a formal class on parallel
        programming, so my perception could be a bit naive.  But
        nevertheless still.

        My understanding of this problem is that if, e. g., the
        following computation is to be done:

    $a = $b->plus ($c, 0)->mult ($d, 0)->sumover ();

        it's the ->sumover () step that is to be considered for the
        parallelization first.  Unfortunately, it's, at the very same
        time, the last to be actually “seen” by PDL.

        For a sequence of elementwise and aggregation operations like
        that, my guess would be to parallelize the computation roughly
        as follows (where half () is a ->slice ()-based function that
        splits a PDL instance into two halves along its most major
        dimension):

    {
        my ($b1, $b2) = half ($b);
        my ($c1, $c2) = half ($c);
        my ($d1, $d2) = half ($d);

        my $t1
            = XXXTHREAD->spawn {
                  $b1->plus ($c1, 0)->mult ($d1, 0)->sumover ();
              };
        my $t2
            = XXXTHREAD->spawn {
                  $b2->plus ($c2, 0)->mult ($d2, 0)->sumover ();
              };
        XXXTHREAD->wait ($t1, $t2);
        my $a
            = PDL->new ([ XXXTHREAD->value ($t1),
                          XXXTHREAD->value ($t2) ])->sumover ();
    }

        (Here, I've used that the ->sumover () over an array is the
        ->sumover () over the ->sumover ()'s over its subarrays.  For
        the other aggregate functions, it may be like or different.)

        Once we have the topmost operator, and the nelem () of the PDL
        instance to be passed to it, we can decide should we utilize the
        parallel computation for it, by “splitting” it like shown above.

        However, the only way I see for the topmost operator to be
        “seen” to PDL before its arguments are actually computed is the
        delayed (AKA lazy) evaluation.  Which is quite common in certain
        languages (and some, like Haskell, are built all around such a
        notion), but which I've rarely (if at all) seen being done in
        Perl.  And I'm not sure that extending PDL “to do it the lazy
        way” could be at all easy.  (And we'd have to watch for all the
        PDL instances involved in such a computation to not become
        tampered along the way, as such tampering should wait for the
        computation to complete first.)

        Still, individual methods could probably be made
        “multicore-enabled” without much effort.  I'm, however, unsure
        about the overall effect it may have upon the performance, etc.

[…]

-- 
FSF associate member #7257

Attachment: pgpCD92zt2qbh.pgp
Description: PGP signature

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to