Re: [Jprogramming] fun with threads

Elijah Stone Tue, 12 Apr 2022 03:19:30 -0700

(Note: even smarter parallel / is possible, given non-homogenousperformance of u, but needs primitives in order to build a concurrentqueue. A primitive 'wait for any' (instead of 'wait for all') would be ofuse here--I cannot bear to poll--but proper queues would be better.)


On Tue, 12 Apr 2022, Elijah Stone wrote:

Part 2: reduction.

I cannot in good conscience write an adverb which claims to be a parallelanalogue to / but which blindly assumes all verbs are associative. Infact, I will require a stronger than associativity: that verbs reduced inparallel are monoids. This starts with the following definition:


monoid=: 2 :'x u y NB.MONOID'

x u monoid n y behaves exactly the same as x u y, but the former formserves as an assertion that u is associative and that it has an identityof n. We can then write our first substitute for u/, which ensures that utakes the form u monoid n before executing in parallel. That looks likethis:


slash=: {{ par=. NPAR <. #y
NB. annoying; no late binding for non-verbs
if. -. u`'' -:&(0&{@,@>) + monoid 0 `'' do.
 u/y return. end.
id=. 1{::^:3,>u
u=. (0{1{::,>u`'') `:6  NB. pull out actual verb to take advantage of SC
if. 0-:#y do. id return. end.
c=. (#y) <.@% par
d=. c*i.par
t=. (c#~<:par) , c+par|#y
> u&.>/ ([: u/ {&t {. {&d }. y"_) t.''"_1 i.par }}

We farm out par threads, each of which reduces a chunk, and then reducethe final results in the main thread before returning. Good enough? Itlooks the part:


  a=. i.1e8
  timex'+/a'
0.0680711
  timex'+ monoid 0 slash a'
0.0139139

A thousand times no! This code is asymptotically suboptimal. This isclearest when NPAR is _, but alas, it is capped at 14 in the presentimplementation. We can fake it, however, by making u very expensive and yvery short. For instance:


  f=. 0:@(6!:3@0.1)
  timex'f monoid 0 slash 16#0'
1.50183

In other words, we have 15 serial applications of f, which is far toomany; there should only be 4. slash is linear. We need to continue toparallelise even when there are fewer elements of y than there areavailable threads, viz the following updated definition:


slash2=: {{ par=. NPAR <. 2 <.@%~ #y
NB. annoying; no late binding for non-verbs
if. -. u`'' -:&(0&{@,@>) + monoid 0 `'' do.
 u/y return. end.
id=. 1{::^:3,>u
u=. (0{1{::,>u`'') `:6  NB. pull out actual verb to take advantage of SC
if. 0-:#y do. id return. end.
c=. (#y) <.@% par
d=. c*i.par
t=. (c#~<:par) , c+par|#y
y=. >([: u/ {&t {. {&d }. y"_) t.''"_1 i.par
u monoid id slash2^:(1~:#) y }}

then:

  timex'f monoid 0 slash2 16#0'
0.400691

much better!

(Note: in practice, there would be a minimum number of elements processedby a given task, which would in many cases be >2; coming up with thisminimum requires a cost model for u. But 2 is asymptotically fine andserves its pedagogical purpose here.)

Another way of seeing the performance difference between slash and slash2is to display the execution tree, by analogy tohttps://www.jsoftware.com/help/dictionary/samp19.htm:


  ]r0=. ,&< monoid '' slash 16#0
┌─┬─────────────────────────────────────────────────────────┐
│0│┌─┬─────────────────────────────────────────────────────┐│
│ ││0│┌─┬─────────────────────────────────────────────────┐││
│ ││ ││0│┌─┬─────────────────────────────────────────────┐│││
│ ││ ││ ││0│┌─┬─────────────────────────────────────────┐││││
│ ││ ││ ││ ││0│┌─┬─────────────────────────────────────┐│││││
│ ││ ││ ││ ││ ││0│┌─┬─────────────────────────────────┐││││││
│ ││ ││ ││ ││ ││ ││0│┌─┬─────────────────────────────┐│││││││
│ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────────────────────────┐││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────────────────────┐│││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────────────────┐││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────────────┐│││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────────┐││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─────┐│││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││0│┌─┬─┐││││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││0│0│││││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─┘││││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─────┘│││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─────────┘││││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─────────────┘│││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─────────────────┘││││││││││
│ ││ ││ ││ ││ ││ ││ ││ ││ │└─┴─────────────────────┘│││││││││
│ ││ ││ ││ ││ ││ ││ ││ │└─┴─────────────────────────┘││││││││
│ ││ ││ ││ ││ ││ ││ │└─┴─────────────────────────────┘│││││││
│ ││ ││ ││ ││ ││ │└─┴─────────────────────────────────┘││││││
│ ││ ││ ││ ││ │└─┴─────────────────────────────────────┘│││││
│ ││ ││ ││ │└─┴─────────────────────────────────────────┘││││
│ ││ ││ │└─┴─────────────────────────────────────────────┘│││
│ ││ │└─┴─────────────────────────────────────────────────┘││
│ │└─┴─────────────────────────────────────────────────────┘│
└─┴─────────────────────────────────────────────────────────┘
  ]r1=. ,&< monoid '' slash2 16#0
┌─────────────────────────────┬─────────────────────────────┐
│┌─────────────┬─────────────┐│┌─────────────┬─────────────┐│
││┌─────┬─────┐│┌─────┬─────┐│││┌─────┬─────┐│┌─────┬─────┐││
│││┌─┬─┐│┌─┬─┐│││┌─┬─┐│┌─┬─┐│││││┌─┬─┐│┌─┬─┐│││┌─┬─┐│┌─┬─┐│││
││││0│0│││0│0│││││0│0│││0│0│││││││0│0│││0│0│││││0│0│││0│0││││
│││└─┴─┘│└─┴─┘│││└─┴─┘│└─┴─┘│││││└─┴─┘│└─┴─┘│││└─┴─┘│└─┴─┘│││
││└─────┴─────┘│└─────┴─────┘│││└─────┴─────┘│└─────┴─────┘││
│└─────────────┴─────────────┘│└─────────────┴─────────────┘│
└─────────────────────────────┴─────────────────────────────┘

r1 is completely balanced, but r0 is lopsided. The depths of these treesalso agree with the previous run times:


  L.r0
15
  L.r1
4

Parallel monoid reduction has been studied extensively, e.g. by fortress.Also well-studied is parallelization of monoid/\, particularly +/\.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] fun with threads

Reply via email to