Re: [Chapel-developers] Reduce bug in Chapel 1.22

Brad Chamberlain Tue, 28 Jul 2020 16:38:19 -0700


Hi Damian —

Sorry not to keep up with this thread lately. Last week was very busy andI'm still catching up.

I was probably unclear. What I meant by "wholesale" is that when thelanguage is told "reduce this 2D array by columns" or "by rows", a partialreduction operator permits the implementation to have each task iterateover its local portion of the 2D array in whatever order it wants (e.g., acache-friendly one) accumulating values into a local scratch vector beforecombining them with those of other tasks. So the "wholesale" was meant toreflect the idea that a partial reduction specifies the operation as awhole permitting the implementation to implement it all at once.

By contrast, typical workarounds when partial reductions are not availabletend to do things like compute a complete reduction over a row-sized sliceof the 2D array, one row at a time, which is a lot more constrained interms of how it's computed. This is a big part of why we want partialreductions in the language. E.g., imagine something like:


        colVect = + reduce(...something specifying "across rows")
                      (Mat * rowVect);

as opposed to something like:

        forall r in rows do
          colVect[r] = + reduce (Mat[r, ..] * rowVect);

(where both of these are intended as rough sketches). The overheads ofthis second form for a shared-memory Mat are not great; for adistributed-memory Mat, they're even more embarrassing.

I'm surprised your serial version would still be 60% behind Fortran; itseems like we ought to be able to reach parity without too much trouble.Can you share the code?


-Brad




Brad Chamberlain  Principal Engineer
Cray, a Hewlett Packard Enterprise company
901 Fifth Ave, Suite 1000 | Seattle, WA 98164
+1-206-701-2077  [email protected]  www.cray.com


On Tue, 28 Jul 2020, Damian McGuckin wrote:

On Mon, 6 Jul 2020, Brad Chamberlain wrote:
I want to note that the pattern you're expressing here is what we'd call apartial reduction in Chapel; a long-intended but not-yet-completed feature,the motivation for it being that expressing a row- or column-wise reductionlike this can be done much more efficiently when computed wholesale ratherthan a row or column at a time, particularly for distributed arrays.There's no good excuse for why we haven't implemented it yet other than alack of users breathing down our neck over it.
I have a version which (in parallel) computes the elements of the result
vector using dot products but it is seriously affected by cache misses.
That great in parallel. But comparing the code in serial mode against a
Fortran program looks pretty bad.
I can replace this reduction by some serial code and I got it to within a 60%penalty over the Fortran. But then that is not parallelizable, so I cannotwin.
Let me know when these 'partial reductions' work.  Not urgent.
I cannot figure out how to do that wholesale operation you mention in my case(without consuming inordinate amounts of temporary memory). The
operation is

        Z := (I - lambda * x * y^T) * Z
where Z is a matrix, x and y are vectors, and lambda is a reciprocal of ascalar.
Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Reduce bug in Chapel 1.22

Reply via email to