Hi Ben,
Michael, I am guessing that the stuff of most interest to yourself is in
the latter half of this email but the rest is background. Hopefully I am
not rambling too much.
On Wed, 2 Sep 2020, Albrecht, Ben wrote:
For example, say task 0 processes column pairs (1..N, 2) serially. After
completing (2,2), task 0 create a new parallel task (task 1) to process
the pairs (2..N, 3).
Not quite. Sorry for my poor explanation. Let's assume that Task #0 is the
controlling process. It starts:
* Task #1 fto processes column PAIRS (row#1, columns#1..N) serially
After Task#1 has completed (1,1), it starts
* Task #2 with qualifications:
- Task#2 is given responsibility for (row#2, columns#1..N) but
- initially you want Task#2 to only process (2,1), i.e. column#1
-- because Task#1 is by now busy processing column#2 and we
-- do not want Task#2 and come along and mess with that work
- Task#2 must check with Task#1 before proceeding to the next column, i.e.
-- it can only process column (2,2) if the parent has finished with
-- (1,2) it, and it can only process (2,3) if the parent has finished
-- with (1,3) and so on until it gets to (2, N).
After Task#2 has completed (2,1) it starts
* Task #2 with qualifications:
- Task#2 is given responsibility for (row#2, columns#1..N) but
... same stuff as above ...
To give each spawned task (that is currently processing row I) the job of
spawning Task#I+1 for row I+1 seems less than optimal. But maybe I am
missing something. Also, Task#I+1 continually needs to check with Task#I
whether it can step to the next column, i.e. to ensure that its creator
(i.e. Task#I) has completed its own processing of the next column that
Task I+1 now wants to process. So, I (maybe wrongly) let Task#0 handle all
of the spawning, which it hopefully does in such a way as to be optimal
for the underlying system (or locales of systems) using a 'forall'.
Additionally, the overhead I saw in my last use of a 'cobegin' was so high
that I now stay away from it. It was awful. I have yet to experiment
seriously with 'coforall'. I very rarely see the need for a begin.
In the task-parallel representation, you are creating the tasks as
needed, rather than spinning them all up at once and having them wait
for work.
Even if I start up the tasks as needed, each new task still needs to
communicate with its creater to know if it is safe to proceed to the next
column. So I cannot see what your approach buys but I am always happy to
learn. I think that I did not clearly explain that each task needs to keep
monitoring what its creator is doing.
However, this may require some effort to reach desired performance. For
example, a naive implementation of this representation would create N
tasks, rather than creating a number tasks appropriate for your hardware
like the forall approach does.
I like the fact that forall creates tasks appropriate to the hardware.
See https://chapel-lang.org/docs/master/primers/taskParallel.html and
https://chapel-lang.org/docs/master/users-guide/index.html#task-parallelism
for more background on task parallelism in Chapel
Thanks for that although I found that this did not explain much. But that
is probably because I am coming from such a beginners-level base. Also,
somebody needs to add something about the overhead in each approach (which
is way beyond my knowledge currently).
This next two paragraphs needs some insight/input from others as they are
related to some past discussions.
In my algorithm, Task#I must keep track of where it is up, so let's define
an array of int's,
stage[0..N] : int
where stage[I] reflects the current column that task#I has completed. Each
element of that array, i.e. stage[I] is updated by only Task#I so there is
no need for the overhead that an atomic variable would involve. Task#I+1
must read stage[I] as it steps through the columns to make sure it does
not try to update column 'K' before Task#I has finished with it. Task#I+1
should run slightly behind Task#I because it starts later. In practice. a
a thread started after another thread may not always run behind a thread
started before it. So, to avoid treading on the heals of its predecessor,
Task#I+1 is continually reading/polling stage[I] which it must read it
from memory to ensure it knows what Task#I is doing. Again no locking is
required. This is your garden variety C volatile variable. How do I handle
it in Chapel? An atomic variable seems like massive overhill as I assume
locking is involved with an atomic. I can access stage[??] through a pair
of custom (tiny) external C routines
void putStage(long *stage, long i, long k) { stage[i] = k; }
and
long getStage(long *stage, long i) { return stage[i]; }
But that seems like I am sticking my head in the sand and avoiding the
underlying problem. And it probably cripples any optimization being done
in the code which is calling these routines.
I still contend that Chapel needs a 'vol[atile]' declaration concept or
something like it. The identifier so declared is much like a 'var[iable]'
but it must always be accessed through, i.e. written-to/read-from, memory.
But I do not know enough of Chapel's big picture so I could be talking
through my hat! I wish I had known enough of Chapel to be a productive
participant in the conversation in 2012 when volatile got removed from
Chapel 1.15.0. Then again, maybe I am talking about peek/poke for atomics
which appears to have lesser overhead than other types of memory ordering.
But again, I have no idea what the overhead for these really is but it all
seems way too high for what I want.
By the way, the online Chapel documentation on atomics does not appear to
explain (or link to an explanation of) memory ordering types. In an older
PDF document, it refers to C11 which then refers to the C++ definition and
other documentation. Even in the current primers
https://chapel-lang.org/docs/primers/atomics.html?highlight=atomic
it says
For more information on Chapels atomics, see the Chapel Language
Specification.
If you click on the Chapel Language Specification you land on a page which
has no reference whatsoever about atomics. If I actually had any solid
grasp of the subject, I would offer to rewrite it, but I do not, which is
why I am reading about it in the first place. I find that by the time I
have gone through all the links to links, I have long forgotten what my
precise original Chapel problem was. Just my 2c. Might be worth putting
onto the to-do list.
Thanks - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers