Brad,

As often happens, you did not solve my problem but gave me enough extra insight to allow me to solve my own problems.

On Mon, 31 Aug 2020, Brad Chamberlain wrote:

Original:

        forall r in rows do
        {
            const ref ur = u[r, ..];

            for j in cslice do
            {
                t[r, j] = vmDot(common, ur, vslab[j, ..]);
            }
        }
    } }


Forall expr:

  forall r in rows do
  {
      const ref ur = u[r, ..];
      const x = [j in cslice] vmDot(common, ur, vslab[j, ..]);

      t[r, cslice] = x;
  }

A way to check would be to write the initialization of 'x' as:

           const x = for j in cslice do vmDot(common, ur, vslab[j, ..]);

If this returned the lost 5%, I think that's the answer.

Sadly no. And I should correct myself, it is 6%. Both your suggestion and my attempt labelled as just Forall above take about 8.5 seconds for my GEMM of 4000*4000 on my old 6-core E5-1660.

On the other hand, the attempt labelled Original which has no intermediate copy takes 8seconds, or say 7.96, or 8.04, or ...

The problem turns out to be the temporary which should have been obvious to me. Avoiding the temporary altogether with

        t[r, cslice] = for j in cslice do vmDot(common, ur, vslab[j, ..])

which I will call the Single Statement approach, and that 6% is recovered.

Interestingly, the Original approach

             for j in cslice do
             {
                 t[r, j] = vmDot(common, ur, vslab[j, ..]);
             }

takes the same time as the Single Statement approach so I will stick with it.

Thanks for the insight - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer


_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to