Re: [Chapel-developers] Parallellism (subject to a constraint)

Damian McGuckin Thu, 03 Sep 2020 22:53:16 -0700


Hi Ben,

Michael, I am guessing that the stuff of most interest to yourself is inthe latter half of this email but the rest is background. Hopefully I amnot rambling too much.


On Wed, 2 Sep 2020, Albrecht, Ben wrote:

For example, say task 0 processes column pairs (1..N, 2) serially. Aftercompleting (2,2), task 0 create a new parallel task (task 1) to processthe pairs (2..N, 3).

Not quite. Sorry for my poor explanation. Let's assume that Task #0 is thecontrolling process. It starts:


* Task #1 fto processes column PAIRS (row#1, columns#1..N) serially

After Task#1 has completed (1,1), it starts

* Task #2 with qualifications:
  - Task#2 is given responsibility for (row#2, columns#1..N) but
  - initially you want Task#2 to only process (2,1), i.e. column#1
    -- because Task#1 is by now busy processing column#2 and we
    -- do not want Task#2 and come along and mess with that work
  - Task#2 must check with Task#1 before proceeding to the next column, i.e.
    -- it can only process column (2,2) if the parent has finished with
    -- (1,2) it, and it can only process (2,3) if the parent has finished
    -- with (1,3) and so on until it gets to (2, N).

After Task#2 has completed (2,1) it starts
* Task #2 with qualifications:
  - Task#2 is given responsibility for (row#2, columns#1..N) but
  ... same stuff as above ...

To give each spawned task (that is currently processing row I) the job ofspawning Task#I+1 for row I+1 seems less than optimal. But maybe I ammissing something. Also, Task#I+1 continually needs to check with Task#Iwhether it can step to the next column, i.e. to ensure that its creator(i.e. Task#I) has completed its own processing of the next column thatTask I+1 now wants to process. So, I (maybe wrongly) let Task#0 handle allof the spawning, which it hopefully does in such a way as to be optimalfor the underlying system (or locales of systems) using a 'forall'.

Additionally, the overhead I saw in my last use of a 'cobegin' was so highthat I now stay away from it. It was awful. I have yet to experimentseriously with 'coforall'. I very rarely see the need for a begin.

In the task-parallel representation, you are creating the tasks asneeded, rather than spinning them all up at once and having them waitfor work.

Even if I start up the tasks as needed, each new task still needs tocommunicate with its creater to know if it is safe to proceed to the nextcolumn. So I cannot see what your approach buys but I am always happy tolearn. I think that I did not clearly explain that each task needs to keepmonitoring what its creator is doing.

However, this may require some effort to reach desired performance. Forexample, a naive implementation of this representation would create Ntasks, rather than creating a number tasks appropriate for your hardwarelike the forall approach does.


I like the fact that forall creates tasks appropriate to the hardware.

See https://chapel-lang.org/docs/master/primers/taskParallel.html andhttps://chapel-lang.org/docs/master/users-guide/index.html#task-parallelismfor more background on task parallelism in Chapel

Thanks for that although I found that this did not explain much. But thatis probably because I am coming from such a beginners-level base. Also,somebody needs to add something about the overhead in each approach (whichis way beyond my knowledge currently).

This next two paragraphs needs some insight/input from others as they arerelated to some past discussions.

In my algorithm, Task#I must keep track of where it is up, so let's definean array of int's,


        stage[0..N] : int

where stage[I] reflects the current column that task#I has completed. Eachelement of that array, i.e. stage[I] is updated by only Task#I so there isno need for the overhead that an atomic variable would involve. Task#I+1must read stage[I] as it steps through the columns to make sure it doesnot try to update column 'K' before Task#I has finished with it. Task#I+1should run slightly behind Task#I because it starts later. In practice. aa thread started after another thread may not always run behind a threadstarted before it. So, to avoid treading on the heals of its predecessor,Task#I+1 is continually reading/polling stage[I] which it must read itfrom memory to ensure it knows what Task#I is doing. Again no locking isrequired. This is your garden variety C volatile variable. How do I handleit in Chapel? An atomic variable seems like massive overhill as I assumelocking is involved with an atomic. I can access stage[??] through a pairof custom (tiny) external C routines


        void putStage(long *stage, long i, long k) { stage[i] = k; }
and
        long getStage(long *stage, long i) { return stage[i]; }

But that seems like I am sticking my head in the sand and avoiding theunderlying problem. And it probably cripples any optimization being done

in the code which is calling these routines.

I still contend that Chapel needs a 'vol[atile]' declaration concept orsomething like it. The identifier so declared is much like a 'var[iable]'but it must always be accessed through, i.e. written-to/read-from, memory.But I do not know enough of Chapel's big picture so I could be talkingthrough my hat! I wish I had known enough of Chapel to be a productiveparticipant in the conversation in 2012 when volatile got removed fromChapel 1.15.0. Then again, maybe I am talking about peek/poke for atomicswhich appears to have lesser overhead than other types of memory ordering.But again, I have no idea what the overhead for these really is but it allseems way too high for what I want.

By the way, the online Chapel documentation on atomics does not appear toexplain (or link to an explanation of) memory ordering types. In an olderPDF document, it refers to C11 which then refers to the C++ definition andother documentation. Even in the current primers


        https://chapel-lang.org/docs/primers/atomics.html?highlight=atomic

it says

        For more information on Chapels atomics, see the Chapel Language
        Specification.

If you click on the Chapel Language Specification you land on a page whichhas no reference whatsoever about atomics. If I actually had any solidgrasp of the subject, I would offer to rewrite it, but I do not, which iswhy I am reading about it in the first place. I find that by the time Ihave gone through all the links to links, I have long forgotten what myprecise original Chapel problem was. Just my 2c. Might be worth puttingonto the to-do list.


Thanks - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer


_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Parallellism (subject to a constraint)

Reply via email to