Brad,
As often happens, you did not solve my problem but gave me enough extra
insight to allow me to solve my own problems.
On Mon, 31 Aug 2020, Brad Chamberlain wrote:
Original:
forall r in rows do
{
const ref ur = u[r, ..];
for j in cslice do
{
t[r, j] = vmDot(common, ur, vslab[j, ..]);
}
}
} }
Forall expr:
forall r in rows do
{
const ref ur = u[r, ..];
const x = [j in cslice] vmDot(common, ur, vslab[j, ..]);
t[r, cslice] = x;
}
A way to check would be to write the initialization of 'x' as:
const x = for j in cslice do vmDot(common, ur, vslab[j, ..]);
If this returned the lost 5%, I think that's the answer.
Sadly no. And I should correct myself, it is 6%. Both your suggestion and
my attempt labelled as just Forall above take about 8.5 seconds for my
GEMM of 4000*4000 on my old 6-core E5-1660.
On the other hand, the attempt labelled Original which has no intermediate
copy takes 8seconds, or say 7.96, or 8.04, or ...
The problem turns out to be the temporary which should have been obvious
to me. Avoiding the temporary altogether with
t[r, cslice] = for j in cslice do vmDot(common, ur, vslab[j, ..])
which I will call the Single Statement approach, and that 6% is recovered.
Interestingly, the Original approach
for j in cslice do
{
t[r, j] = vmDot(common, ur, vslab[j, ..]);
}
takes the same time as the Single Statement approach so I will stick with
it.
Thanks for the insight - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers