Re: [Chapel-developers] Reduce bug in Chapel 1.22

Brad Chamberlain Mon, 03 Aug 2020 22:10:46 -0700


Hi Damian —

Sounds like you've largely sorted this out, but just to touch on a fewpoints:

As:

        var y = [1..columns] real = 0:real;

        for r in rows do
        {
                y += x[r] * z[r, ..];
        }

where y and x are vectors. It is a direct mapping. What's wrong with that?

Modulo the row- vs. column-major memory issues you eventually arrived at,Chapel's support for rank-change slicing is currently known to be fairlyheavyweight, so the expression 'z[r, ..]' is, unfortunately, not cheap atpresent. I expect that this accounts for a fair amount of the overhead.

Silly me, I decided. Let's use iterators. Now the code is:

        y = 0;
        for (r, c) in zD do // effectively called with ref z : [?zD] real
        {
                y[c] += x[r] * z[r, c];
        }

This one seems to me to be fairly well-written serial Chapel code in thatthe 2D iteration over zD will do it in row-major order, which Chapellikes, and you'll have good re-use on one of your vectors (x) and multiplesweeps over your other (y). If you wanted to push this further, it'd be

interesting to break the loop into pure scalar loops (not that we want
people to do this forever) and see whether tucking that repeated read of
x[r] into a scalar helps.  That is:

        for r in rows {
          const xr = x[r];
          for c in cols {
            y[c] += xr * z[r,c];
          }
        }

This is the sort of thing that one would really like the Chapel and/or Ccompilers to do automatically given the previouos loop, but I don't happento know how successful or not we tend to be at optimizing it (and it couldalso depend on the choice and version of C compiler, obviously).

Note that none of this is meant to suggest that this is the preferred wayto write it in Chapel; writing it the cleaner ways (which is how we'd liketo see it written) and then embarrassing us into tuning the Chapel code iscompletely valid as well. And of course partial reductions ought to endup being both super-clean and efficient. But I am curious how close we

can get to Fortran.

You mention the vectorization as well, and that's something we'recontinuing to look into in the context of the LLVM back-end.

The second is fascinating. I would never have guessed that an O(n) write tomemory would have 20% the impact of O(n^2) floating point operations.


Welcome to the exciting world of modern architectures and cost models! (?)

I will put working partial reductions on my Santa wish-list for 2021 now!


Sounds good.

-Brad

_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Re: [Chapel-developers] Reduce bug in Chapel 1.22

Reply via email to