On 09/11/2018 19:11, Jeff Law wrote:
There's a ton of work related to reduction setup, updates and teardown.
  I don't guess there's any generic code we can/should be re-using.  Sigh.

I'm not sure what can be shared, or not, here. For OpenMP we don't have any special code, but OpenACC is much closer to the metal, and AMD GCN does things somewhat differently to NVPTX.

WRT your move patterns.  I'm a bit concerned about using distinct
matters for so many different variants.  But they mostly seem confined
to vector variants.  Be aware you may need to squash them into a single
pattern over time to keep LRA happy.

As you might guess, the move patterns have been really difficult to get right. The added dependency on the EXEC register tends to put LRA into an infinite loop, and the fact that GCN vector moves are always scatter/gather (rather than a contiguous load/store from a base address) makes spills rather painful.

Thanks for your review, I'll have a V2 patch-set soonish.

Andrew

Reply via email to