Hello gcc

I have been looking at optimizations of pixel-format conversion recently and 
have noticed that gcc does take advantage of SSE4a extrq, BMI1 bextr TBM 
bextri or BMI2 pext instructions when it could be useful.

As far as I can tell it should not be that hard. A bextr expression can 
typically be recognized as ((x >> s)  & mask) or ((x << s1)) >> s2). But I am 
unsure where to do such a matching since the mask needs to have specific form 
to be valid for bextr, so it seems it needs to be done before instruction 
selection.

Secondly the bextr instruction in itself only replace two already fast 
instructions so is very minor (unless extracting variable bit-fields which is 
harder recognize).  The real optimization comes from being able to use pext 
(parallel bit extract), which can implement several bextr expressions in 
parallel. 

So, where would be the right place to implement such instructions. Would it 
make sense to recognize bextr early before we get to i386 code, or would it be 
better to recognize it late. And where do I put such instruction selection 
optimizations?

Motivating example:

unsigned rgb32_to_rgb16(unsigned rgb32) {
        unsigned char red = (rgb32 >> 19) & 0x1f;
        unsigned char green = (rgb32 >> 10) & 0x3f;
        unsigned char blue = rgb32  & 0x1f;
       return (red << 11) | (green << 5) | blue;
}

can be implemented as pext(rgb32, 0x001f3f1f)

Best regards
`Allan Sandfeld

Reply via email to