On Fri, Feb 14, 2014 at 2:23 PM, Allan Sandfeld Jensen
li...@carewolf.com wrote:
Hello gcc
I have been looking at optimizations of pixel-format conversion recently and
have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM
bextri or BMI2 pext instructions when it could be useful.
As far as I can tell it should not be that hard. A bextr expression can
typically be recognized as ((x s) mask) or ((x s1)) s2). But I am
unsure where to do such a matching since the mask needs to have specific form
to be valid for bextr, so it seems it needs to be done before instruction
selection.
Secondly the bextr instruction in itself only replace two already fast
instructions so is very minor (unless extracting variable bit-fields which is
harder recognize). The real optimization comes from being able to use pext
(parallel bit extract), which can implement several bextr expressions in
parallel.
So, where would be the right place to implement such instructions. Would it
make sense to recognize bextr early before we get to i386 code, or would it be
better to recognize it late. And where do I put such instruction selection
optimizations?
Motivating example:
unsigned rgb32_to_rgb16(unsigned rgb32) {
unsigned char red = (rgb32 19) 0x1f;
unsigned char green = (rgb32 10) 0x3f;
unsigned char blue = rgb32 0x1f;
return (red 11) | (green 5) | blue;
}
can be implemented as pext(rgb32, 0x001f3f1f)
We have a special pass that already deals with similar patterns,
the bswap pass in tree-ssa-math-opts.c. It does symbolic
execution to produce the composition of a value. It currently
handles byte-shifts only I think (not shifting by 19 or 10) but
this is certainly the way I'd recognize pext() (and other generic
shuffles supported by vector ISAs).
You'd have to extend the representation it uses to handle
these more arbitrary shifts/masks of course.
Richard.
Best regards
`Allan Sandfeld