This is the third rework of the patchset previously posted on September 5th and November 16th. As before, the series contains the non-OpenACC/OpenMP portions of a port to AMD GCN3 and GCN5 GPU processors. It's sufficient to build single-threaded programs, with vectorization in the usual way. C and Fortran are supported, C++ is not supported, and the other front-ends have not been tested. The OpenACC/OpenMP/libgomp portion will follow, once this is committed, eventually.
Compared to the v2 patchset, patch 1, "Fix IRA ICE", has been dropped, and a new, unrelated, patch 1 has been added: "Fix LRA bug". The IRA issue has now been solved by reworking the move instructions in the back-end so that they no longer require explicit mention of the EXEC register (this is now managed mostly by the md_reorg pass). I also took the opportunity to rework the EXEC use throughout the machine description (something I've been wanting to get to for ages); the primary instruction patterns no longer use vec_merge, and there are "_exec" variants defined (mostly via define_subst) for the use of specific expanders and so that combine can optimize conditional vector moves. Additionally, the patterns that choose which unit to use for scalar operations now only clobber the relevant condition register (via a match_scratch), not both of them. The new LRA issue was exposed by the above changes, but would affect any target where patterns referring to an eliminable register might also include a "scratch" register. I've also addressed the various feedback I received from patch reviewers. -- Andrew Stubbs Mentor Graphics / CodeSourcery