On Mon, Mar 24, 2014 at 01:33:11PM +0100, Richard Biener wrote:
> On Mon, Mar 24, 2014 at 1:25 PM, Uros Bizjak <[email protected]> wrote:
> > Hello!
> >
> >> On Mon, Mar 24, 2014 at 12:13 PM, Ulrich Drepper <[email protected]> wrote:
> >>>> Your patch is correct IMHO, but maybe it worst to add all missing
> >>>> `mm512_set1*' stuff?
> >>>>
> >>>> According to trunk and [1] we're still missing (beside mentioned by you)
> >>>> _mm512_set1_epi16 and _mm512_set1_epi8 broadcasts.
> >>>
> >>> Yes, more are missing, but I think those will need new builtins. The
> >>> _ps and _pd don't require additional instructions.
> >>>
> >>> _mm512_set1_epi16 might have to map to vpbroadcastw. _mm512_set1_epi8
> >>> might have to map to vpbroadcastb. I haven't seen a way to generate
> >>> those instructions if needed and so this work was out of scope for now
> >>> due to time constraints. I agree, they should be added as quickly as
> >>> possible to avoid releasing headers with incomplete APIs.
> >>>
> >>> What is the verdict on checking these changes in? Too late for the
> >>> next release?
> >>
> >> This kind of changes can also be made for 4.9.1 for example.
> >
> > OTOH, these changes are isolated to intrinsic header files, and we
> > have quite extensive testsuite for these. I see no problem to check-in
> > these changes even at this stage.
> >
> > So, if there is no better solution I propose to check these changes
> > in, since the benefit to users outweight (minor) risk. Would this be
> > OK from RM POV, also weighting in benefits to users?
>
> Yes, sure. I've just meant that it's ok to do more work for 4.9.1, too.
But, if for say _mm512_set1_epi8 you have no intrinsics, just do something
similar to what _mm256_set_epi8 and _mm256_set1_epi8 do, the compiler should
be smart enough to recognize those as broadcasts.
The following is recognized well:
typedef char v32qi __attribute__((vector_size (32)));
v32qi foo (char a)
{
return (v32qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
a, a, a, a, a, a, a, a, a, a, a, a };
}
This isn't:
typedef char v64qi __attribute__((vector_size (64)));
v64qi foo (char a)
{
return (v64qi) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
}
But I believe it has been discussed already that the V32HImode and V64QImode
support is incomplete in 4.9. While I think there are no direct broadcasts
for these modes, one can e.g. use AVX2 broadcasts and then duplicate into
the 512-bit mode.
See http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00757.html
Jakub