On 04/24/2015 06:32 PM, Jan Hubicka wrote:
> Also I believe it was kind of Richard's design deicsion to avoid use of
> (paradoxical) subregs for vector conversions because these have funny
> implications.

Yes indeed.

> The code for handling upper parts of paradoxical subregs is controlled by
> macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
> V1DI->V2DI conversions fluently without some middle-end hacking. (it will
> probably try to produce zero extensions)
> 
> When we are on SSE instructions, it would be great to finally teach
> copy_by_pieces/store_by_pieces to use vector instructions (these are more
> compact and either equaly fast or faster on some CPUs). I hope to get into
> this, but it would be great if someone beat me.

Well, I think it would be worthwhile to teach the i386 backend how to do 64-bit
vectors in SSE registers.  First, this would aid portability with other targets
who may have GCC generic vectors written only for 8 byte quantities.  Since we
do have zero-extending 8 byte load/store insns for SSE, we don't actually need
paradoxical regs, just additional macro-ization of the existing patterns.

This almost certainly would conflict with the MMX code generation.  But given
the problems we've always had with that, perhaps it's time to kill that off.
To a large extent we can preserve source compatibility with MMX builtins once
we have 8-byte vectors implemented in SSE.

As for the subject, we'd want to delay expansion of DImode arithmetic until
after RA.  That bypasses all of the good work done in lower-subreg.c, so we
need some sort of replacement.

I was wondering this morning about the possibility of a kind of constraint that
would allow RA to generate pairs of registers via CONCAT.  That is, the two
hard registers within the CONCAT are collectively the double-word allocation,
but need not be sequential like current multi-word allocations.  A target using
such a constraint is promising to handle the CONCAT either by splitting (and
gen_lowpart et al), or print_operand letters (e.g. the m68k %R, for outputting
the low part of a pair).

With that, we get the best of both -- lower-subreg effectively happening in RA,
and DImode arithmetic in SSE no subregs required.


r~

Reply via email to