Consider the following:
var qr : [1..n] real;
var u : [1..m, 1..n] real;
Let's assume that each row of 'u' must be processed, either 1 at a time,
or in batches of 2 (with an row remaining to be processed alone) or in
batches of 4 (with one, two or three rows remaining to be processed by
themselves). That just an 'unrolling' where the processing in batches
of 4 will be done in parallel.
An example of such processing is the application of Given's planar
rotations to columns j..k of 'u' above. This could be done as:
// do Givens Rotations to all 'm' rows of the columns j..k, in
// 'vmRot' doing this processing in blocks of 4 rows at a time
const jk = j .. k;
var i = 1;
// When 'm' is not a multiple of 4, there are 3 (1+2) or 2 or 1
// rows that remain to be processed separately - do these all up
// front, i.e. 1..1 & 2..3 or just 1..2 or maybe even just 1..1
// where this little things are done serially one after another
for r in 1..2 do if (m & r) != 0 then
{
const rows = i .. i + r - 1;
vmRot(q[jk], u[rows, jk]);
i += r;
}
// do the rest in parallel
{
const rows = i .. m;
[r in rows by 4] vmRot(q[jk], u[r..r+3, jk]);
}
Why do things with Slices
=========================
The use of array slicing self-documents the code that sub-matrices are
being used. But sometimes, there is a need for speed and we throw out
the nicer ways of expressing things.
Making things go faster
=======================
It was determined that an alternative solution
vmRot(jk, q, u[r..r+3, ..]);
is measurably faster than the above but does not convey that vmRot works
on the submatrix [r..r+3, j..k]. Considering the floating point operation
count and number of memory accesses in the benchmark using a matrix with
sometimes 2000 columns, the fact that it is measurable is a bit scary.
Should we give up on slicing altogether and do the above as
[r in rows by 4] vmRot(r..r+3, jk, q, u);
where vmRot must changed to handle submatrices directly. This avoids any
slicing.
But anyway, progressing:
What is the overhead in passing
u[r..r+3, jk]
and is it the same overhead as
u[r..r+3, ..]
Curious minds would like to know.
Regards - Damian
P.S. I have a sense of Deja-Vu here. My 2nd post to the list in 2011 was
about this topic but in a whole different context,
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers