Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
> Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
> subtract?  

I mean the latter, that seemed to be what richi was suggesting previously. 

> The problem with the latter is that we need to fill the
> extra unused elements with something and remove them later.

That's fair enough, fake/don't care elements is a bit of a hack. I'll try 
something out with the conversions and regular subtract.

thanks for the help,
Joel


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton  writes:
>>> So emit a v4qi->v8qi gimple conversion
>>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>>
>>I was thinking of using a v8qi->v8hi convert on each operand followed
>>by a normal v8hi subtraction.  That's what we'd generate if the target
>>didn't define the widening patterns.
>
> Is there a reason that conversion is preferred? If we use a widening subtract
> then we don't need to rely on RTL fusing later.

Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
subtract?  The problem with the latter is that we need to fill the
extra unused elements with something and remove them later.
And I don't think Richard liked the idea of having separate
v8qi->v8hi and v16qi->v8hi widening subtracts.

Relying on RTL fusing doesn't seem too bad TBH.

Thanks,
Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>> So emit a v4qi->v8qi gimple conversion
>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>
>I was thinking of using a v8qi->v8hi convert on each operand followed
>by a normal v8hi subtraction.  That's what we'd generate if the target
>didn't define the widening patterns.

Is there a reason that conversion is preferred? If we use a widening subtract
then we don't need to rely on RTL fusing later.


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton  writes:
 In practice this will only affect targets that choose to use mixed
 vector sizes, and I think it's reasonable to optimise only for the
 case in which such targets support widening conversions.  So what
 do you think about the idea of emitting separate conversions and
 a normal subtract?  We'd be relying on RTL to fuse them together,
 but at least there would be no redundancy to eliminate.
>>>
>>> So in vectorizable_conversion for the widen-minus you'd check
>>> whether you can do a v4qi -> v4hi and then emit a conversion
>>> and a wide minus?
>>
>>Yeah.
>
> This seems reasonable, as I recall we decided against adding
> internal functions for the time being as all the existing vec patterns
> code would have to be refactored.

FWIW, that was for the hi/lo part.  The internal function in this case
would have been a normal standalone operation that makes sense independently
of the hi/lo pairs, and could be generated independently of the vectoriser
(e.g. from match.pd patterns).

Using an internal function is actually less work than using a tree code,
because you don't need to update all the various tree_code switch
statements.

> So emit a v4qi->v8qi gimple conversion
> then a regular widen_lo/hi using the existing backend patterns/optabs?

I was thinking of using a v8qi->v8hi convert on each operand followed
by a normal v8hi subtraction.  That's what we'd generate if the target
didn't define the widening patterns.

Thanks,
Richard



Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>>> In practice this will only affect targets that choose to use mixed
>>> vector sizes, and I think it's reasonable to optimise only for the
>>> case in which such targets support widening conversions.  So what
>>> do you think about the idea of emitting separate conversions and
>>> a normal subtract?  We'd be relying on RTL to fuse them together,
>>> but at least there would be no redundancy to eliminate.
>>
>> So in vectorizable_conversion for the widen-minus you'd check
>> whether you can do a v4qi -> v4hi and then emit a conversion
>> and a wide minus?
>
>Yeah.

This seems reasonable, as I recall we decided against adding
internal functions for the time being as all the existing vec patterns
code would have to be refactored. So emit a v4qi->v8qi gimple conversion
then a regular widen_lo/hi using the existing backend patterns/optabs?


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Richard Biener  writes:
>> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>> >> >>
>> >> >> Hi Richard(s),
>> >> >>
>> >> >> I'm just looking to see if I'm going about this the right way, based 
>> >> >> on the discussion we had on IRC. I've managed to hack something 
>> >> >> together, I've attached a (very) WIP patch which gives the correct 
>> >> >> codegen for the testcase in question 
>> >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
>> >> >> obviously need to support other widening patterns and differentiate 
>> >> >> between big/little endian among other things.
>> >> >>
>> >> >> I added a backend pattern because I wasn't quite clear which changes 
>> >> >> to make in order to allow the existing backend patterns to be used 
>> >> >> with a V8QI, or how to represent V16QI where we don't care about the 
>> >> >> top/bottom 8. I made some attempt in optabs.c, which is in the patch 
>> >> >> commented out, but I'm not sure if I'm going about this the right way.
>> >> >
>> >> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
>> >> > confined to vectorizable_conversion.  The
>> >> > only complication might be sub-optimal code-gen for the vector-vector
>> >> > CTOR compensating for the input
>> >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>> >>
>> >> Yeah.  I don't really like this because it means that it'll be
>> >> impossible to remove the redundant work in gimple.  The extra elements
>> >> are just a crutch to satisfy the type system.
>> >
>> > We can certainly devise a more clever way to represent a paradoxical 
>> > subreg,
>> > but at least the actual operation (WIDEN_MINUS_LOW) would match what
>> > the hardware can do.
>>
>> At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
>> operations.  E.g. the low-part intrinsic for signed 8-bit integers is:
>>
>>int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);
>>
>> whereas the high-part intrinsic is:
>>
>>int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);
>>
>> So representing the low part as a 128-bit → 128-bit operation is already
>> a little artifical.
>
> that's intrinsincs - but I guess the actual machine instruction is different?

FWIW, the instructions are the same.  E.g. for AArch64 it's:

ssubl   v0.8h, v0.8b, v1.8b

(8b being a 64-bit vector and 8h being a 128-bit vector) instead of:

ssubl   v0.8h, v0.16b, v1.16b

The AArch32 lowpart is:

vsubl.s16 q0, d0, d1

where a q register joins together two d registers.

>> > OTOH we could simply accept half of a vector for
>> > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
>> > deal with subreg frobbing?  Not that I'd like that very much though, even
>> > a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
>> > how to go about endianess here ... the _LOW/_HIGH paints us into some
>> > corner here)
>>
>> I think it only makes sense for the low part.  But yeah, I guess that
>> would work (although I agree it doesn't seem very appealing :-)).
>>
>> > A new IFN (direct optab?) means targets with existing support for _LO/HI
>> > do not automatically benefit which is a shame.
>>
>> In practice this will only affect targets that choose to use mixed
>> vector sizes, and I think it's reasonable to optimise only for the
>> case in which such targets support widening conversions.  So what
>> do you think about the idea of emitting separate conversions and
>> a normal subtract?  We'd be relying on RTL to fuse them together,
>> but at least there would be no redundancy to eliminate.
>
> So in vectorizable_conversion for the widen-minus you'd check
> whether you can do a v4qi -> v4hi and then emit a conversion
> and a wide minus?

Yeah.

Richard

> I guess as long as vectorizer costing behaves
> as if the op is fused that's a similarly OK trick as a V_C_E or a
> vector CTOR.
>
> Richard.
>
>> Thanks,
>> Richard
>> >
>> >> As far as Joel's patch goes, I was imagining that the new operation
>> >> would be an internal function rather than a tree code.  However,
>> >> if we don't want that, maybe we should just emit separate conversions
>> >> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>> >>
>> >> Thanks,
>> >> Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Biener via Gcc-patches
On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
> >> >>
> >> >> Hi Richard(s),
> >> >>
> >> >> I'm just looking to see if I'm going about this the right way, based on 
> >> >> the discussion we had on IRC. I've managed to hack something together, 
> >> >> I've attached a (very) WIP patch which gives the correct codegen for 
> >> >> the testcase in question 
> >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
> >> >> obviously need to support other widening patterns and differentiate 
> >> >> between big/little endian among other things.
> >> >>
> >> >> I added a backend pattern because I wasn't quite clear which changes to 
> >> >> make in order to allow the existing backend patterns to be used with a 
> >> >> V8QI, or how to represent V16QI where we don't care about the 
> >> >> top/bottom 8. I made some attempt in optabs.c, which is in the patch 
> >> >> commented out, but I'm not sure if I'm going about this the right way.
> >> >
> >> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
> >> > confined to vectorizable_conversion.  The
> >> > only complication might be sub-optimal code-gen for the vector-vector
> >> > CTOR compensating for the input
> >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
> >>
> >> Yeah.  I don't really like this because it means that it'll be
> >> impossible to remove the redundant work in gimple.  The extra elements
> >> are just a crutch to satisfy the type system.
> >
> > We can certainly devise a more clever way to represent a paradoxical subreg,
> > but at least the actual operation (WIDEN_MINUS_LOW) would match what
> > the hardware can do.
>
> At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
> operations.  E.g. the low-part intrinsic for signed 8-bit integers is:
>
>int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);
>
> whereas the high-part intrinsic is:
>
>int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);
>
> So representing the low part as a 128-bit → 128-bit operation is already
> a little artifical.

that's intrinsincs - but I guess the actual machine instruction is different?

> > OTOH we could simply accept half of a vector for
> > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
> > deal with subreg frobbing?  Not that I'd like that very much though, even
> > a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
> > how to go about endianess here ... the _LOW/_HIGH paints us into some
> > corner here)
>
> I think it only makes sense for the low part.  But yeah, I guess that
> would work (although I agree it doesn't seem very appealing :-)).
>
> > A new IFN (direct optab?) means targets with existing support for _LO/HI
> > do not automatically benefit which is a shame.
>
> In practice this will only affect targets that choose to use mixed
> vector sizes, and I think it's reasonable to optimise only for the
> case in which such targets support widening conversions.  So what
> do you think about the idea of emitting separate conversions and
> a normal subtract?  We'd be relying on RTL to fuse them together,
> but at least there would be no redundancy to eliminate.

So in vectorizable_conversion for the widen-minus you'd check
whether you can do a v4qi -> v4hi and then emit a conversion
and a wide minus?  I guess as long as vectorizer costing behaves
as if the op is fused that's a similarly OK trick as a V_C_E or a
vector CTOR.

Richard.

> Thanks,
> Richard
> >
> >> As far as Joel's patch goes, I was imagining that the new operation
> >> would be an internal function rather than a tree code.  However,
> >> if we don't want that, maybe we should just emit separate conversions
> >> and a normal subtraction, like we would for (signed) x - (unsigned) y.
> >>
> >> Thanks,
> >> Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>> >>
>> >> Hi Richard(s),
>> >>
>> >> I'm just looking to see if I'm going about this the right way, based on 
>> >> the discussion we had on IRC. I've managed to hack something together, 
>> >> I've attached a (very) WIP patch which gives the correct codegen for the 
>> >> testcase in question 
>> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would obviously 
>> >> need to support other widening patterns and differentiate between 
>> >> big/little endian among other things.
>> >>
>> >> I added a backend pattern because I wasn't quite clear which changes to 
>> >> make in order to allow the existing backend patterns to be used with a 
>> >> V8QI, or how to represent V16QI where we don't care about the top/bottom 
>> >> 8. I made some attempt in optabs.c, which is in the patch commented out, 
>> >> but I'm not sure if I'm going about this the right way.
>> >
>> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
>> > confined to vectorizable_conversion.  The
>> > only complication might be sub-optimal code-gen for the vector-vector
>> > CTOR compensating for the input
>> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>>
>> Yeah.  I don't really like this because it means that it'll be
>> impossible to remove the redundant work in gimple.  The extra elements
>> are just a crutch to satisfy the type system.
>
> We can certainly devise a more clever way to represent a paradoxical subreg,
> but at least the actual operation (WIDEN_MINUS_LOW) would match what
> the hardware can do.

At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
operations.  E.g. the low-part intrinsic for signed 8-bit integers is:

   int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);

whereas the high-part intrinsic is:

   int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);

So representing the low part as a 128-bit → 128-bit operation is already
a little artifical.

> OTOH we could simply accept half of a vector for
> the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
> deal with subreg frobbing?  Not that I'd like that very much though, even
> a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
> how to go about endianess here ... the _LOW/_HIGH paints us into some
> corner here)

I think it only makes sense for the low part.  But yeah, I guess that
would work (although I agree it doesn't seem very appealing :-)).

> A new IFN (direct optab?) means targets with existing support for _LO/HI
> do not automatically benefit which is a shame.

In practice this will only affect targets that choose to use mixed
vector sizes, and I think it's reasonable to optimise only for the
case in which such targets support widening conversions.  So what
do you think about the idea of emitting separate conversions and
a normal subtract?  We'd be relying on RTL to fuse them together,
but at least there would be no redundancy to eliminate.

Thanks,
Richard
>
>> As far as Joel's patch goes, I was imagining that the new operation
>> would be an internal function rather than a tree code.  However,
>> if we don't want that, maybe we should just emit separate conversions
>> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>>
>> Thanks,
>> Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Biener via Gcc-patches
On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
> >>
> >> Hi Richard(s),
> >>
> >> I'm just looking to see if I'm going about this the right way, based on 
> >> the discussion we had on IRC. I've managed to hack something together, 
> >> I've attached a (very) WIP patch which gives the correct codegen for the 
> >> testcase in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). 
> >> It would obviously need to support other widening patterns and 
> >> differentiate between big/little endian among other things.
> >>
> >> I added a backend pattern because I wasn't quite clear which changes to 
> >> make in order to allow the existing backend patterns to be used with a 
> >> V8QI, or how to represent V16QI where we don't care about the top/bottom 
> >> 8. I made some attempt in optabs.c, which is in the patch commented out, 
> >> but I'm not sure if I'm going about this the right way.
> >
> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
> > confined to vectorizable_conversion.  The
> > only complication might be sub-optimal code-gen for the vector-vector
> > CTOR compensating for the input
> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>
> Yeah.  I don't really like this because it means that it'll be
> impossible to remove the redundant work in gimple.  The extra elements
> are just a crutch to satisfy the type system.

We can certainly devise a more clever way to represent a paradoxical subreg,
but at least the actual operation (WIDEN_MINUS_LOW) would match what
the hardware can do.  OTOH we could simply accept half of a vector for
the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
deal with subreg frobbing?  Not that I'd like that very much though, even
a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
how to go about endianess here ... the _LOW/_HIGH paints us into some
corner here)

A new IFN (direct optab?) means targets with existing support for _LO/HI
do not automatically benefit which is a shame.

> As far as Joel's patch goes, I was imagining that the new operation
> would be an internal function rather than a tree code.  However,
> if we don't want that, maybe we should just emit separate conversions
> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>
> Thanks,
> Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>>
>> Hi Richard(s),
>>
>> I'm just looking to see if I'm going about this the right way, based on the 
>> discussion we had on IRC. I've managed to hack something together, I've 
>> attached a (very) WIP patch which gives the correct codegen for the testcase 
>> in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
>> obviously need to support other widening patterns and differentiate between 
>> big/little endian among other things.
>>
>> I added a backend pattern because I wasn't quite clear which changes to make 
>> in order to allow the existing backend patterns to be used with a V8QI, or 
>> how to represent V16QI where we don't care about the top/bottom 8. I made 
>> some attempt in optabs.c, which is in the patch commented out, but I'm not 
>> sure if I'm going about this the right way.
>
> Hmm, as said, I'd try to arrange like illustrated in the attachment,
> confined to vectorizable_conversion.  The
> only complication might be sub-optimal code-gen for the vector-vector
> CTOR compensating for the input
> vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)

Yeah.  I don't really like this because it means that it'll be
impossible to remove the redundant work in gimple.  The extra elements
are just a crutch to satisfy the type system.

As far as Joel's patch goes, I was imagining that the new operation
would be an internal function rather than a tree code.  However,
if we don't want that, maybe we should just emit separate conversions
and a normal subtraction, like we would for (signed) x - (unsigned) y.

Thanks,
Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-02 Thread Richard Biener via Gcc-patches
On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>
> Hi Richard(s),
>
> I'm just looking to see if I'm going about this the right way, based on the 
> discussion we had on IRC. I've managed to hack something together, I've 
> attached a (very) WIP patch which gives the correct codegen for the testcase 
> in question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
> obviously need to support other widening patterns and differentiate between 
> big/little endian among other things.
>
> I added a backend pattern because I wasn't quite clear which changes to make 
> in order to allow the existing backend patterns to be used with a V8QI, or 
> how to represent V16QI where we don't care about the top/bottom 8. I made 
> some attempt in optabs.c, which is in the patch commented out, but I'm not 
> sure if I'm going about this the right way.

Hmm, as said, I'd try to arrange like illustrated in the attachment,
confined to vectorizable_conversion.  The
only complication might be sub-optimal code-gen for the vector-vector
CTOR compensating for the input
vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)

Richard.

> Joel


p
Description: Binary data


[RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-01 Thread Joel Hutton via Gcc-patches
Hi Richard(s),

I'm just looking to see if I'm going about this the right way, based on the 
discussion we had on IRC. I've managed to hack something together, I've 
attached a (very) WIP patch which gives the correct codegen for the testcase in 
question (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
obviously need to support other widening patterns and differentiate between 
big/little endian among other things.

I added a backend pattern because I wasn't quite clear which changes to make in 
order to allow the existing backend patterns to be used with a V8QI, or how to 
represent V16QI where we don't care about the top/bottom 8. I made some attempt 
in optabs.c, which is in the patch commented out, but I'm not sure if I'm going 
about this the right way.

Joel
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index be2a5a865172bdd7848be4082abb0fdfb0b35937..c66b8a367623c8daf4423677d292e292feee3606 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3498,6 +3498,14 @@
   DONE;
 })
 
+(define_insn "vec_widen_usubl_half_v8qi"
+  [(match_operand:V8HI 0 "register_operand")
+(match_operand:V8QI 1 "register_operand")
+(match_operand:V8QI 2 "register_operand")]
+  "TARGET_SIMD"
+  "usubl\t%0., %1., %2."
+)
+
 (define_expand "vec_widen_subl_hi_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
diff --git a/gcc/expr.c b/gcc/expr.c
index 04ef5ad114d0662948c896cdbf58e67737b39c7e..0939a156deef63f1cf2fa7e29c2c94925820f2ba 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9785,6 +9785,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
 case VEC_WIDEN_PLUS_HI_EXPR:
 case VEC_WIDEN_PLUS_LO_EXPR:
+case VEC_WIDEN_MINUS_HALF_EXPR:
 case VEC_WIDEN_MINUS_HI_EXPR:
 case VEC_WIDEN_MINUS_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 876a3a6f348de122e5a52e6dd70d7946bc810162..10aa21d07595325fd8ef3057444853fc946385de 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -186,6 +186,9 @@ bool can_vec_perm_const_p (machine_mode, const vec_perm_indices &,
 enum insn_code find_widening_optab_handler_and_mode (optab, machine_mode,
 		 machine_mode,
 		 machine_mode *);
+enum insn_code find_half_mode_optab_and_mode (optab, machine_mode,
+		 machine_mode,
+		 machine_mode *);
 int can_mult_highpart_p (machine_mode, bool);
 bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 opt_machine_mode get_len_load_store_mode (machine_mode, bool);
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index 3248ce2c06e65c9c0366757907ab057407f7c594..7abfc04aa18b7ee5b734a1b1f4378b4615ee31fd 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -462,6 +462,17 @@ can_vec_perm_const_p (machine_mode mode, const vec_perm_indices &sel,
   return false;
 }
 
+enum insn_code
+find_half_mode_optab_and_mode (optab op, machine_mode to_mode,
+  machine_mode from_mode,
+  machine_mode *found_mode)
+{
+insn_code icode = CODE_FOR_nothing;
+if (GET_MODE_2XWIDER_MODE(from_mode).exists(found_mode))
+  icode = optab_handler (op, *found_mode);
+return icode;
+}
+
 /* Find a widening optab even if it doesn't widen as much as we want.
E.g. if from_mode is HImode, and to_mode is DImode, and there is no
direct HI->SI insn, then return SI->DI, if that exists.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index c94073e3ed98f8c4cab65891f65dedebdb1ec274..eb52dc15f8094594c4aa22d5fc1c442886e4ebf6 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -185,6 +185,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 case VEC_WIDEN_MINUS_HI_EXPR:
   return (TYPE_UNSIGNED (type)
 	  ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
+
+case VEC_WIDEN_MINUS_HALF_EXPR:
+  return vec_widen_usubl_half_optab;
 
 case VEC_UNPACK_HI_EXPR:
   return (TYPE_UNSIGNED (type)
@@ -308,6 +311,16 @@ supportable_convert_operation (enum tree_code code,
   if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
 return false;
 
+  /* The case where vectype_in is half the vector width, as opposed to the
+ normal case for widening patterns of vector width input, with output in
+ multiple registers. */
+  if (code == WIDEN_MINUS_EXPR &&
+  known_eq(TYPE_VECTOR_SUBPARTS(vectype_in),TYPE_VECTOR_SUBPARTS(vectype_out)) )
+  {
+*code1 = VEC_WIDEN_MINUS_HALF_EXPR;
+return true;
+  }
+
   /* First check if we can done conversion directly.  */
   if ((code == FIX_TRUNC_EXPR
&& can_fix_p (m1,m2,TYPE_UNSIGNED (vectype_out), &truncp)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..1252097be9893d7d65ea844fc0eda9bad70b9256 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -293,6 +293,13 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,