Hi Ben,

Michael, I am guessing that the stuff of most interest to yourself is in the latter half of this email but the rest is background. Hopefully I am not rambling too much.

On Wed, 2 Sep 2020, Albrecht, Ben wrote:

For example, say task 0 processes column pairs (1..N, 2) serially. After completing (2,2), task 0 create a new parallel task (task 1) to process the pairs (2..N, 3).

Not quite. Sorry for my poor explanation. Let's assume that Task #0 is the controlling process. It starts:

* Task #1 fto processes column PAIRS (row#1, columns#1..N) serially

After Task#1 has completed (1,1), it starts

* Task #2 with qualifications:
  - Task#2 is given responsibility for (row#2, columns#1..N) but
  - initially you want Task#2 to only process (2,1), i.e. column#1
    -- because Task#1 is by now busy processing column#2 and we
    -- do not want Task#2 and come along and mess with that work
  - Task#2 must check with Task#1 before proceeding to the next column, i.e.
    -- it can only process column (2,2) if the parent has finished with
    -- (1,2) it, and it can only process (2,3) if the parent has finished
    -- with (1,3) and so on until it gets to (2, N).

After Task#2 has completed (2,1) it starts
* Task #2 with qualifications:
  - Task#2 is given responsibility for (row#2, columns#1..N) but
  ... same stuff as above ...

To give each spawned task (that is currently processing row I) the job of spawning Task#I+1 for row I+1 seems less than optimal. But maybe I am missing something. Also, Task#I+1 continually needs to check with Task#I whether it can step to the next column, i.e. to ensure that its creator (i.e. Task#I) has completed its own processing of the next column that Task I+1 now wants to process. So, I (maybe wrongly) let Task#0 handle all of the spawning, which it hopefully does in such a way as to be optimal for the underlying system (or locales of systems) using a 'forall'.

Additionally, the overhead I saw in my last use of a 'cobegin' was so high that I now stay away from it. It was awful. I have yet to experiment seriously with 'coforall'. I very rarely see the need for a begin.

In the task-parallel representation, you are creating the tasks as needed, rather than spinning them all up at once and having them wait for work.

Even if I start up the tasks as needed, each new task still needs to communicate with its creater to know if it is safe to proceed to the next column. So I cannot see what your approach buys but I am always happy to learn. I think that I did not clearly explain that each task needs to keep monitoring what its creator is doing.

However, this may require some effort to reach desired performance. For example, a naive implementation of this representation would create N tasks, rather than creating a number tasks appropriate for your hardware like the forall approach does.

I like the fact that forall creates tasks appropriate to the hardware.

See https://chapel-lang.org/docs/master/primers/taskParallel.html and https://chapel-lang.org/docs/master/users-guide/index.html#task-parallelism for more background on task parallelism in Chapel

Thanks for that although I found that this did not explain much. But that is probably because I am coming from such a beginners-level base. Also, somebody needs to add something about the overhead in each approach (which is way beyond my knowledge currently).

This next two paragraphs needs some insight/input from others as they are related to some past discussions.

In my algorithm, Task#I must keep track of where it is up, so let's define an array of int's,

        stage[0..N] : int

where stage[I] reflects the current column that task#I has completed. Each element of that array, i.e. stage[I] is updated by only Task#I so there is no need for the overhead that an atomic variable would involve. Task#I+1 must read stage[I] as it steps through the columns to make sure it does not try to update column 'K' before Task#I has finished with it. Task#I+1 should run slightly behind Task#I because it starts later. In practice. a a thread started after another thread may not always run behind a thread started before it. So, to avoid treading on the heals of its predecessor, Task#I+1 is continually reading/polling stage[I] which it must read it from memory to ensure it knows what Task#I is doing. Again no locking is required. This is your garden variety C volatile variable. How do I handle it in Chapel? An atomic variable seems like massive overhill as I assume locking is involved with an atomic. I can access stage[??] through a pair of custom (tiny) external C routines

        void putStage(long *stage, long i, long k) { stage[i] = k; }
and
        long getStage(long *stage, long i) { return stage[i]; }

But that seems like I am sticking my head in the sand and avoiding the underlying problem. And it probably cripples any optimization being done
in the code which is calling these routines.

I still contend that Chapel needs a 'vol[atile]' declaration concept or something like it. The identifier so declared is much like a 'var[iable]' but it must always be accessed through, i.e. written-to/read-from, memory. But I do not know enough of Chapel's big picture so I could be talking through my hat! I wish I had known enough of Chapel to be a productive participant in the conversation in 2012 when volatile got removed from Chapel 1.15.0. Then again, maybe I am talking about peek/poke for atomics which appears to have lesser overhead than other types of memory ordering. But again, I have no idea what the overhead for these really is but it all seems way too high for what I want.

By the way, the online Chapel documentation on atomics does not appear to explain (or link to an explanation of) memory ordering types. In an older PDF document, it refers to C11 which then refers to the C++ definition and other documentation. Even in the current primers

        https://chapel-lang.org/docs/primers/atomics.html?highlight=atomic

it says

        For more information on Chapels atomics, see the Chapel Language
        Specification.

If you click on the Chapel Language Specification you land on a page which has no reference whatsoever about atomics. If I actually had any solid grasp of the subject, I would offer to rewrite it, but I do not, which is why I am reading about it in the first place. I find that by the time I have gone through all the links to links, I have long forgotten what my precise original Chapel problem was. Just my 2c. Might be worth putting onto the to-do list.

Thanks - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer


_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to