Hi! On Fri, Jan 05, 2024 at 06:27:05PM -0500, Michael Meissner wrote: > In the current MMA subsystem for Power10, there are 8 512-bit accumulator > registers. These accumulators are each tied to sets of 4 FPR registers. When
Four VSX registers -- the FP registers are only a 64 bit part of each of those. Please do not call those VSX registers "FPRs". They are not. > These patches add support for the 512-bit accumulators within the dense math > system, and for allocation of the 1,024-bit DMRs. At this time, no additional > built-in functions will be done to support any dense math features other than > doing data movement between the DMRs and the VSX registers. Before we can > look > at adding any new dense math support other than data movement, we need the GCC > compiler to be able to allocate and use these DMRs. Okido. > If you compile with -mcpu=power10, the wD constraint will match the equivalent > FPR register that overlaps with the accumulator. If you compile with > -mcpu=future, the wD constraint will match the DMR register and not the FPR > register. > > These patches also modifies the print_operand %A output modifier to print out > DMR register numbers if -mcpu=future, and continue to print out the FPR > register number divided by 4 for -mcpu=power10. Yup. Unfortunately that is the best we can do probably. It _feels_ fragile, but it wil probably be okay in practice. > Going forward, hopefully if you modify your code to use the wD constraint and > %A output modifier, you can write code that switches more easily between the > two systems. But it will never become completely transparent. Luckily the old thing will over time fade into the background. So, please post the -mcpu=future patches in a separate series, first. I'll comment on that patch in a minute, you'll probably want to take those comments into consideration before posting that series ;-) Segher