Hi -
> I would have thought the parallel overhead of a 'cobegin' would have
> been much less than that in a 'forall' loop
I'm not certain about what is happening in your specific case,
but here are some guesses:
* the forall approach might be more cache-friendly
because modern systems share cache among cores
as compared to the cobegin which is using different data
in each task.
* the forall by default will try to detect if there is already
enough tasks running to keep the system busy. You can
pass `dataParIgnoreRunningTasks=true` to your program
to change this behavior to experiment with it. As a result,
in a larger program, the `forall`s might have really low
parallel overhead. In contrast, the cobegin will always
create tasks, which can be bad for overhead when these
are small.
* the forall lets you use 6 cores but the cobegin only 2
* as far as the Chapel language is concerned, the forall
indicates to the compiler that the body of the loop
is order independent. This can aid in vectorization - but
it should not matter with the C backend today and
having good vectorization in this case is something I'm
looking at improving with `--llvm` soon.
Best,
-michael
Given an LAPACK-like linear algebra routine to apply a planar rotation to
two vectors
proc rot(r : complex, ref acj : [?aciD] Real, ref aci : [?acjD] Real)
{
for (ai, aj) in zip(aci, acj) do
{
const w = r * (ai, aj):complex;
(ai, aj) = (w.re, w.im);
}
}
I am computing an SVD. Any vector size more than 2000 with my algorithm
is probably suspect. That is cool because I do not need to go above 1000
for the types of problem I am attacking. Within the algorithm, there are
two independent tasks operating on two independend matrices 'v' and 'a'
rot(tv, v[i..j, k - 1], v[i..j, k])
and
rot(ta, a[m..n, k - 1], a[m..n, k])
which are the major numerical computations in the workload, a QR iteration.
Neither of these two computations depend on the other.
Note that the computations being done with 'rot'
a) use vectors which have stride > 1 as they work on rows of a matrix
with column major ordering, and
b) have vectors from size 6 to 1600 for the problems at hand, which are
largely test cases with well known solutions.
So, using Chapel code like
cobegin
{
rot(tv, v[i..j, k - 1], v[i..j, k]);
rot(ta, a[m..n, k - 1], a[m..n, k]);
}
seemed like a reasonable idea to maybe halve the computation time.
if the 2 'tasks' could run independently of each other.
I only have 6 cores to exploit at this development stage. It normally
gives me a good indication of what will or not work in parallel.
I left the 'for' loop in 'rot' alone because I did not want to have it
trying to run in parallel and confusing the scheduler. I thought halving
the run-time was more than a good performance gain.
But it is slower than serial C for a vector size of 1600.
So, let's throw out the cobegin, run these two calls to 'rot' one after
another but let the routine 'rot' run its loop as a forall in parallel.
And it is twice as fast as serial C for the same vector size.
Nice!
On just 6 cores, with a vector length of 1600, the implementation of SVD
running in parallel is twice the speed of a serial C implementation with
very little work. And the algorithm is still very obvious and readable.
The nice clean code appears to parallelize nicely. But for reasons I do
not quite comprehend fully.
I would have thought the parallel overhead of a 'cobegin' would have
been much less than that in a 'forall' loop (over what in my cases is
1600 complex numbers being read out of cache, multiplied together, and
then copied back into cache memory).
Is my thinking wrong?
My data is still living in cache, but only just.
I am using 1.20.0 for the moment, not the latest.
Thanks - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers