Hi!

On Fri, Jan 05, 2024 at 06:27:05PM -0500, Michael Meissner wrote:
> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.  When

Four VSX registers -- the FP registers are only a 64 bit part of each of
those.  Please do not call those VSX registers "FPRs".  They are not.

> These patches add support for the 512-bit accumulators within the dense math
> system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
> built-in functions will be done to support any dense math features other than
> doing data movement between the DMRs and the VSX registers.  Before we can 
> look
> at adding any new dense math support other than data movement, we need the GCC
> compiler to be able to allocate and use these DMRs.

Okido.

> If you compile with -mcpu=power10, the wD constraint will match the equivalent
> FPR register that overlaps with the accumulator.  If you compile with
> -mcpu=future, the wD constraint will match the DMR register and not the FPR
> register.
> 
> These patches also modifies the print_operand %A output modifier to print out
> DMR register numbers if -mcpu=future, and continue to print out the FPR
> register number divided by 4 for -mcpu=power10.

Yup.  Unfortunately that is the best we can do probably.  It _feels_
fragile, but it wil probably be okay in practice.

> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.

But it will never become completely transparent.  Luckily the old thing
will over time fade into the background.

So, please post the -mcpu=future patches in a separate series, first.
I'll comment on that patch in a minute, you'll probably want to take
those comments into consideration before posting that series ;-)


Segher

Reply via email to