[Chapel-developers] Parallellism (subject to a constraint)

Damian McGuckin Fri, 28 Aug 2020 21:10:53 -0700


Hi,

There exists a need to process the N columns of an array 'U' as

        for j in 1..N do
        {
                // treat column j as the anchor column

                for k in j+1 .. N do  // Stage 'j' processing
                {
                        // mess with operations on column 'j' and 'k'
                }
        }

The inner loops can be treated as independent of each other subject to aconstraint, i.e. there is a need to guarantee (or somehow enforce) that


        when stage 'j' wants to process column 'k', it knows (or can
        check) that processing of that same column 'k' by stage(s) prior,
        i.e. 'j-1', is completed (or can twiddle its thumbs waiting for
        that to happen before proceeding).

That last statement is obviously recursive although it does not need to
be programmed as such.

One can program this (maybe poorly) as

        var stage : [0..N] atomic int;

        stage[0].write(N);
        // define that the 'ghost' precursor stage
        // has completed processing of all columns
        forall j in 1..N-1 do
        {
                // define that this stage has completed NO columns
                stage[j].write(0);
                // this routine must update stage[j] with 'k' when
                // it has finished its own processing of column 'k'
                processStage(j, U, ......);
        }

and then update stage[j] within

        processStage(j, U, ......)

to reflect the column 'last processed', say 'k', in that stage. This
then allows the next stage

        processStage(j+1, U, ......)

to inspect that variable, i.e. stage[j],  during its own operation to
ensure that it does not attempt to use any column 'i' where 'i > k'.

This approach involves waiting which is a big no-no and demands that anydistributed implementation update what amounts to a variable in theprimary locale. This is less than ideal although the overhead is yet tobe quantified. There is an upside. Because one would probably processcolumns in blocks, say of 4 or 8 (or at a pinch 16), the apparent need totest the atomic variable every columns drops by that same blocking factor.So, while the waiting is not so critical (even if it is detrimental to thealgorithm's readability), the atomic variables in the primary locale arestill a worry.


Are there better ways to attack this problem?

And yes, if looking at parallel Jacobi SVD sweeps, there are algorithmsthat try and parallelize that logic very differently. But they do/did nothave the benefit of Chapel at their disposal at the time there were beingdeveloped. And besides, they are quite complex, do even naughtier thingsto the readability of the algorithm. Avoiding them, would really, really,

be a desirable thing.

Thanks - Damian


_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

[Chapel-developers] Parallellism (subject to a constraint)

Reply via email to