https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112886

            Bug ID: 112886
           Summary: We need a new print_operand output modifier for vector
                    double
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meissner at gcc dot gnu.org
  Target Milestone: ---

I've been working with vector double support to provide faster memory latency
for specialized applications.  While the work I've been doing might not make it
in GCC 14, I've been looking at what is needed to provide usable asm support
for using vector pairs.

The problem is we have %x<n> for VSX registers that maps the traditional FPR
registers into 0..31 and the traditional Altivec registers into 32..63.  We
have %L<n> that returns the 2nd register in a multiple register object. 
However, we don't have a combination of %x<n> and %L<n>, where for VSX
registers it would return the 2nd register in the vector pair as a VSX register
number.

For example, if you wanted to write a loop where you use vector pairs to load
the values and then manually process each vector, you might want to write using
%S<n> to access the 2nd vector register:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("xxspltib %x0,0\nxxspltib %S0,0" : "=wa" (sum));
for (i = 0; i < n; i++)
    __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %S0,%S1,%S2"
             : "=wa" (sum)
             : "wa" (sum), "wa" (p[i]));

However without this new print_operand output modifier, you would have to use
either "d" or "f" to limit the registers to the traditional FPR registers. 
I.e.:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("xxspltib %0,0\nxxspltib %L0,0" : "=f" (sum));
for (i = 0; i < n; i++)
    __asm__ ("xvadddp %0,%1,%2\n\txvadddp %L0,%L1,%L2"
             : "=f" (sum)
             : "f" (sum), "f" (p[i]));

If you do this, you limit the number of vector pairs that can be used to 16
instead of 32.  Generally you would want to use this in performance critical
code, and often there you are using all of the registers.

You can't just modify %L<n> to deal with VSX registers, because the user might
be using an instruction that only accesses Altivec registers, i.e.:

__vector_pair *p, sum;
size_t i, n;
// ...
__asm__ ("vspltisw %0,0\nvspltisw %L0,0" : "=v" (sum));
for (i = 0; i < n; i++)
    __asm__ ("vadduqm %0,%1,%2\n\tvadduqm %L0,%L1,%L2"
             : "=v" (sum)
             : "v" (sum), "v" (p[i]));

Reply via email to