Re: [Chapel-developers] Slicing is an optimal approach in Matrix Multiplication - GEMM

Brad Chamberlain Mon, 31 Aug 2020 18:56:52 -0700


Hi Damian —


Taking your three versions:

Original:

       forall r in rows do
       {
           const ref ur = u[r, ..];

           for j in cslice do
           {
               t[r, j] = vmDot(common, ur, vslab[j, ..]);
           }
       }
   } }



Forall expr:

        forall r in rows do
        {
            const ref ur = u[r, ..];
            const x = [j in cslice] vmDot(common, ur, vslab[j, ..]);

            t[r, cslice] = x;
        }


Succinct:

        forall r in rows do
        {
            const x = [j in cslice] vmDot(common, u[r, ..], vslab[j, ..]);

            t[r, cslice] = x;
        }

slows down seriously, about 25+%.

I believe the difference between the final two is a simple case of Chapelnot doing loop hoisting optimizations for non-trivial expressions.Specifically, you and I can see that `u[r, ..]` is independent of thevalue of 'j' so could be evaluated once and re-used for all iterations ofthe 'j' loop, but the Chapel compiler isn't mature enough to do this yet.So your "Forall expr" version gets an improvement by manually hoisting theevaluation of that expression out of the loop.

The delta between the original and forall expression version is lessobvious, but I would guess that it could be due to the use of nestedparallelism (though we'd hope that the impact would be more minimal than5%, at least for loops with large trip counts). Specifically, by default,'[j in cslice]' will be executed in parallel, but it'll first check to seewhether there's already a task per core, and if so, will serialize theloop. Maybe this execution-time check is adding the 5% overhead? A way

to check would be to write the initialization of 'x' as:

          const x = for j in cslice do vmDot(common, ur, vslab[j, ..]);


If this returned the lost 5%, I think that's the answer.

-Brad

_______________________________________________
Chapel-developers mailing list
Chapel-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Slicing is an optimal approach in Matrix Multiplication - GEMM

Reply via email to