https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112886
Bug ID: 112886 Summary: We need a new print_operand output modifier for vector double Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: meissner at gcc dot gnu.org Target Milestone: --- I've been working with vector double support to provide faster memory latency for specialized applications. While the work I've been doing might not make it in GCC 14, I've been looking at what is needed to provide usable asm support for using vector pairs. The problem is we have %x<n> for VSX registers that maps the traditional FPR registers into 0..31 and the traditional Altivec registers into 32..63. We have %L<n> that returns the 2nd register in a multiple register object. However, we don't have a combination of %x<n> and %L<n>, where for VSX registers it would return the 2nd register in the vector pair as a VSX register number. For example, if you wanted to write a loop where you use vector pairs to load the values and then manually process each vector, you might want to write using %S<n> to access the 2nd vector register: __vector_pair *p, sum; size_t i, n; // ... __asm__ ("xxspltib %x0,0\nxxspltib %S0,0" : "=wa" (sum)); for (i = 0; i < n; i++) __asm__ ("xvadddp %x0,%x1,%x2\n\txvadddp %S0,%S1,%S2" : "=wa" (sum) : "wa" (sum), "wa" (p[i])); However without this new print_operand output modifier, you would have to use either "d" or "f" to limit the registers to the traditional FPR registers. I.e.: __vector_pair *p, sum; size_t i, n; // ... __asm__ ("xxspltib %0,0\nxxspltib %L0,0" : "=f" (sum)); for (i = 0; i < n; i++) __asm__ ("xvadddp %0,%1,%2\n\txvadddp %L0,%L1,%L2" : "=f" (sum) : "f" (sum), "f" (p[i])); If you do this, you limit the number of vector pairs that can be used to 16 instead of 32. Generally you would want to use this in performance critical code, and often there you are using all of the registers. You can't just modify %L<n> to deal with VSX registers, because the user might be using an instruction that only accesses Altivec registers, i.e.: __vector_pair *p, sum; size_t i, n; // ... __asm__ ("vspltisw %0,0\nvspltisw %L0,0" : "=v" (sum)); for (i = 0; i < n; i++) __asm__ ("vadduqm %0,%1,%2\n\tvadduqm %L0,%L1,%L2" : "=v" (sum) : "v" (sum), "v" (p[i]));