[Jprogramming] fun with threads

Elijah Stone Tue, 12 Apr 2022 02:41:42 -0700

I was going to spend my evening writing more parallel c, but I decided itwould probably be more relaxing (if less exciting) to write some parallelj instead. So here we are: some parallel reimplementations of primitives,in increasing order of interest:


First, some basic definitions to start us off:


0&T.@'' ::] ^:_'' NB.eat all the cores
NPAR=. 1 T.''

Now, let's try to implement rank, the quintessence of parallel iterationin j. Actually, let's just implement monadic "_1, which iterates inparallel over the major cells of y. The obvious formulation is:


peach0=: <@:((t.'')"_1)

but this is almost certainly not what we want; virtual nouns cannot bepassed between threads, so this will incur copies which will be expensiveunless u is expensive and y's major cells are small. We also need y tohave a small number of cells, or else we will cause scheduler contention(or, in the current implementation, we'll block everything on the mainthread).

peach0 is important as an archetype of all forms of parallel iteration,because the limitations I mentioned are fundamental to the currentimplementation. However it's necessary to layer additional abstractionson top of this primitive to get an actually performant parallelimplementation of "_1.


Here's a first testcase:

   a=. i.4 1e8
   timex'+/"_1 a'
0.225933
   timex'+/peach0 a'
0.911939

The solution to this problem that I presented the other day is to pass atask identifier to the task, and use it to construct the virtual nounthere instead of in the creating thread, viz:


peach1=: {{ > ([: u {&y) t.''"_1 i.#y }}

now we see an improvement!

   timex'+/peach1 a'
0.0738621

This is not completely general, though; it starts up a thread for _every_major cell of y; if there are many small cells, this will perform quitebadly. A better strategy then will be to start up NPAR tasks, giving eacha contiguous slice of values, and using the primitive u"n to operate onthe cells within that slice. Here I use the dyads {. and }. to constructthat slice:


prankb=: {{ par=. NPAR <. #y
 c=. (#y) <.@% par          NB.chunk size
 d=. c*i.par                NB.drops, see also +/\par#c
 t=. (c#~<:par) , c+par|#y  NB.takes, last of which may be oversized
 ([: u"n {&t {. {&d }. y"_) t.''"_1 i.par }}
prank=: ;@:prank
peach2=: prank _1

Another nice thing about this definition is that, since it also uses theprimitive ", it is able to take advantage of IRS once hardware parallelismruns out. Performance compares favourably on a transposed a:


   a=. i.1e8 4
   timex'+/"_1 a'
0.458253
   timex'+/peach2 a'
0.311498

but it is not quite so impressive as in the previous case. The majorityof the time is actually spent in ;, copying result cells out of boxes,because there are so many results, and each is individually so cheap tocompute. It's for this reason I also expose the definition of prankb,which leaves its result chunks in their boxes, as it may be possible toprofit by operating on those results in their boxes. If, for instance, wetry to sum up all the elements in the array, the numbers get nicer:


   timex'+/+/"_1 a'
0.529847
   timex'+/+/&>+/prankb _1 a'
0.130087

we'll make this even faster later.

There is more work to be done on prank; aside from the obvious bug and thelack of ambivalence, it assumes there are enough major cells to satisfyall threads. In the event that there are fewer major cells than NPAR, andn is deeper than _1, it would be able to profit by subdividing along otheraxes. I haven't bothered doing this yet.

I also made a parallel implementation of x u\ y, but it proved not veryinteresting, so I won't reproduce it here. It is effectively the same asprankb--including the metacircular reversion to the primitive \--exceptthat d and t need to be computed differently. I think that the templateused there is a general one. It's effectively the same as how you wouldwrite it in c.


Next: reductions.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] fun with threads

Reply via email to