Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-12 Thread Richard Sandiford
Richard Henderson  writes:
> On 12/11/2011 04:50 AM, Richard Sandiford wrote:
>> [Mingjie, please could you help with the Loongson question near the end?]
>
> Actually, can you tell me how to test these abi combinations?  I keep
> trying to use mips-sim or mips64-sim and get linker errors complaining
> of abi combinations.

I tend to use mips64{,el}-linux-gnu with a hacked-up QEMU (hacked up to
add MIPS16 to the cpu model, which isn't relevant here).  But I'm surprised
*-elf is causing problems.  Something like mipsisa64-elfoabi ought to just
work (I last tested that a few weeks ago).

>>  Little-endian:
>> 
>> The semantics of the RTL pattern are:
>> 
>>  { 0L, 0U } = { X[I3], X[I4 + 2] }, where X = { 1L, 1U, 2L, 2U }
>> 
>>  so: 0L = { 1L, 1U }[I3] (= )
>>  0U = { 2L, 2U }[I4] (= )
>> 
>>   = 2,  = I4 ? U : L
>>   = 1,  = I3 ? U : L
>> 
>>  [LL] !I4 && !I3   [UL] I4 && !I3
>>  [LU] !I4 && I3[UU] I4 && I3
>> 
>>  Big-endian:
>> 
>> The semantics of the RTL pattern are:
>> 
>>  { 0U, 0L } = { X[I3], X[I4 + 2] }, where X = { 1U, 1L, 2U, 2L }
>> 
>>  so: 0U = { 1U, 1L }[I3] (= )
>>  0L = { 2U, 2L }[I4] (= )
>> 
>>   = 1,  = I3 ? L : U
>>   = 2,  = I4 ? L : U
>> 
>>  [UU] !I3 && !I4   [UL] !I3 && I4
>>  [LU] I3 && !I4[LL] I3 && I4.  */
>> 
>> which suggests that the PUL and PLU entries for big-endian should be
>> the other way around.  Does that sound right, or have I misunderstood?
>
> Yes, that sounds right.
>
>> ...for little-endian, we need to pass the "U" and "L" components of the
>> mnemonic in the reverse order: the MIPS instruction specifies the upper
>> part first, whereas the rtl pattern specifies the lower part first.
>> And for little-endian, U refers to memory element 1 and L to memory
>> element 0.  So I think this should be:
>
> ... Except that the actual output of the LE insn actually swaps the
> operands too.  So I think these expanders should not *also* swap the
> operands.  I've tidied these up a bit since then.

Hmm, are you sure?  The order of the operands passed to these p?? expanders
is supposed to match the order of the operands in the final asm instruction.
A user's "A = __builtin_mips_plu_ps (B, C)" corresponds to
"gen_mips_plu_ps (A, B, C)", which must always generate "PLU.PS A, B, C", etc.
So if the define_insn swaps the operands (which from above, it must for
little-endian), then these expanders need to swap too, to undo the effect.
Or, taking the longer version from yesterday:

;; Expanders for builtins.  The instruction:
;;
;; P[UL][UL].PS , , 
;;
;; says that the upper part of  is taken from half of  and
;; the lower part of  is taken from half of .  This means
;; that the P[UL][UL].PS operand order matches memory order on big-endian
;; targets;  is element 0 of the V2SF result while  is element 1.
;; However, the P[UL][UL].PS operand order is the reverse of memory order
;; on little-endian targets;  is element 1 of the V2SF result while
;;  is element 0.  The arguments to vec_perm_const_ps are always in
;; memory order.
;;
;; Similarly, "U" corresponds to element 0 on big-endian targets but
;; to element 1 on little-endian targets.

(would be nice to have these comments in the patch if nothing else).

Because of that, I think I preferred the original style, with no
SET rtl pattern in the expander, and calls to emit_insn (gen_...)
in the C code.

>> I think this is endian-dependent.  For little-endian, the bottom two bits
>> of the mask determine element 0; for big-endian, the top two bits of the
>> mask do. 
>
> Recall that loongson can only run in little-endian.

Doh.

> I added comments about that in the md file, but it would do no harm to
> add another here.

Thanks.

Richard


Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-11 Thread Hans-Peter Nilsson
On Sun, 11 Dec 2011, Richard Sandiford wrote:
> Hans-Peter Nilsson  writes:
> > Please also consider incrementing __mips_loongson_vector_rev

> For avoidance of doubt, that only applies to the latter ("as H-P
> suggests") option.  The patch as posted keeps the public interface
> the same.

Correct; I misread it, sorry.

brgds, H-P


Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-11 Thread Richard Henderson
On 12/11/2011 04:50 AM, Richard Sandiford wrote:
> [Mingjie, please could you help with the Loongson question near the end?]

Actually, can you tell me how to test these abi combinations?  I keep trying to 
use mips-sim or mips64-sim and get linker errors complaining of abi 
combinations.

>  Little-endian:
> 
> The semantics of the RTL pattern are:
> 
>   { 0L, 0U } = { X[I3], X[I4 + 2] }, where X = { 1L, 1U, 2L, 2U }
> 
>   so: 0L = { 1L, 1U }[I3] (= )
>   0U = { 2L, 2U }[I4] (= )
> 
>= 2,  = I4 ? U : L
>= 1,  = I3 ? U : L
> 
>   [LL] !I4 && !I3   [UL] I4 && !I3
>   [LU] !I4 && I3[UU] I4 && I3
> 
>  Big-endian:
> 
> The semantics of the RTL pattern are:
> 
>   { 0U, 0L } = { X[I3], X[I4 + 2] }, where X = { 1U, 1L, 2U, 2L }
> 
>   so: 0U = { 1U, 1L }[I3] (= )
>   0L = { 2U, 2L }[I4] (= )
> 
>= 1,  = I3 ? L : U
>= 2,  = I4 ? L : U
> 
>   [UU] !I3 && !I4   [UL] !I3 && I4
>   [LU] I3 && !I4[LL] I3 && I4.  */
> 
> which suggests that the PUL and PLU entries for big-endian should be
> the other way around.  Does that sound right, or have I misunderstood?

Yes, that sounds right.

> ...for little-endian, we need to pass the "U" and "L" components of the
> mnemonic in the reverse order: the MIPS instruction specifies the upper
> part first, whereas the rtl pattern specifies the lower part first.
> And for little-endian, U refers to memory element 1 and L to memory
> element 0.  So I think this should be:

... Except that the actual output of the LE insn actually swaps the operands 
too.  So I think these expanders should not *also* swap the operands.  I've 
tidied these up a bit since then.

>> +static bool
>> +mips_expand_vpc_ps (struct expand_vec_perm_d *d)

I've eliminated this function since then.

>> +  /* Convert the selector into the packed 8-bit form for pshufh.  */
>> +  for (i = mask = 0; i < 4; i++)
>> +mask |= (d->perm[i] & 3) << (i * 2);
> 
> I think this is endian-dependent.  For little-endian, the bottom two bits
> of the mask determine element 0; for big-endian, the top two bits of the
> mask do. 

Recall that loongson can only run in little-endian.  I added comments about 
that in the md file, but it would do no harm to add another here.

> (There's a machine in the farm, but bootstrapping on it is rather slow.)

Yeah, I started checking out the tree there yesterday and it never completed.

> I think a lot of the endianness stuff in the patch is dependent on byte
> endianness rather than word endianness.  Since we only support two out
> of the four combinations, it seems better not to worry which and simply
> use TARGET_{BIG,LITTLE}_ENDIAN instead of {WORDS,BYTES}_{BIG,LITTLE}_ENDIAN.

Sure.

This is my current patch, which doesn't have the pul/plu insns swapped, as 
suggested above.  I did change the loongson.h interface as H-P suggested.


r~
commit b7790c7a9e53d66d1f348c3f2adb5b8a9bf2d93c
Author: Richard Henderson 
Date:   Wed Dec 7 14:17:02 2011 -0800

mips: Implement vec_perm_const.

diff --git a/gcc/config/mips/loongson.h b/gcc/config/mips/loongson.h
index 6bfd4d7..dfd6505 100644
--- a/gcc/config/mips/loongson.h
+++ b/gcc/config/mips/loongson.h
@@ -447,15 +447,15 @@ psadbh (uint8x8_t s, uint8x8_t t)
 
 /* Shuffle halfwords.  */
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order)
+pshufh_u (uint16x4_t s, uint8_t order)
 {
-  return __builtin_loongson_pshufh_u (dest, s, order);
+  return __builtin_loongson_pshufh_u (s, order);
 }
 
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order)
+pshufh_s (int16x4_t s, uint8_t order)
 {
-  return __builtin_loongson_pshufh_s (dest, s, order);
+  return __builtin_loongson_pshufh_s (s, order);
 }
 
 /* Shift left logical.  */
diff --git a/gcc/config/mips/loongson.md b/gcc/config/mips/loongson.md
index 225f4d1..7c7e29f 100644
--- a/gcc/config/mips/loongson.md
+++ b/gcc/config/mips/loongson.md
@@ -24,10 +24,7 @@
   UNSPEC_LOONGSON_PCMPEQ
   UNSPEC_LOONGSON_PCMPGT
   UNSPEC_LOONGSON_PEXTR
-  UNSPEC_LOONGSON_PINSR_0
-  UNSPEC_LOONGSON_PINSR_1
-  UNSPEC_LOONGSON_PINSR_2
-  UNSPEC_LOONGSON_PINSR_3
+  UNSPEC_LOONGSON_PINSRH
   UNSPEC_LOONGSON_PMADD
   UNSPEC_LOONGSON_PMOVMSK
   UNSPEC_LOONGSON_PMULHU
@@ -200,6 +197,51 @@
   "pandn\t%0,%1,%2"
   [(set_attr "type" "fmul")])
 
+;; Logical AND.
+(define_insn "*loongson_and"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+   (and:VWHB (match_operand:VWHB 1 "register_operand" "f")
+ (match_operand:VWHB 2 "register_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+  "and\t%0,%1,%2"
+  [(set_attr "type" "fmul")])
+
+;; Logical OR.
+(define_insn "*loongson_or"
+  [(set (match_operand:VWHB 0 "register_operand" "=f")
+   (ior:VWHB (match_operand:VWHB 1 "register_ope

Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-11 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> On Sun, 11 Dec 2011, Richard Sandiford wrote:
>> [Mingjie, please could you help with the Loongson question near the end?]
>
>> As H-P mentioned, this changes the __builtin_* interface for the PSHUFH
>> intrinsics.  These intrinsics are supposed to be used via the inline
>> wrappers in loongson.h, so we can either keep the unused argument in
>> the pshufh_{u,s} or, as H-P suggests, remove the argument from both.
>> I don't know which is better.  loongson.h needs to change either way,
>> so in the patch below, I went for the former.  The latter would need
>> testsuite changes too.  Mingjie, which do you think is best?
>
> Please also consider incrementing __mips_loongson_vector_rev, or
> if currently empty, set to 1.  And mention PR48068 in the
> changelog; fixed in part.

For avoidance of doubt, that only applies to the latter ("as H-P
suggests") option.  The patch as posted keeps the public interface
the same.

Richard


Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-11 Thread Hans-Peter Nilsson
On Sun, 11 Dec 2011, Richard Sandiford wrote:
> [Mingjie, please could you help with the Loongson question near the end?]

> As H-P mentioned, this changes the __builtin_* interface for the PSHUFH
> intrinsics.  These intrinsics are supposed to be used via the inline
> wrappers in loongson.h, so we can either keep the unused argument in
> the pshufh_{u,s} or, as H-P suggests, remove the argument from both.
> I don't know which is better.  loongson.h needs to change either way,
> so in the patch below, I went for the former.  The latter would need
> testsuite changes too.  Mingjie, which do you think is best?

Please also consider incrementing __mips_loongson_vector_rev, or
if currently empty, set to 1.  And mention PR48068 in the
changelog; fixed in part.

(I can't see what builtin_define does, set it to 1 or just defined?)

brgds, H-P


Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-11 Thread Richard Sandiford
[Mingjie, please could you help with the Loongson question near the end?]

Richard Henderson  writes:
> @@ -89,61 +89,102 @@
>DONE;
>  })
>  
> -; pul.ps - Pair Upper Lower
> -(define_insn "mips_pul_ps"
> +(define_insn "vec_perm_const_ps"
>[(set (match_operand:V2SF 0 "register_operand" "=f")
> - (vec_merge:V2SF
> -  (match_operand:V2SF 1 "register_operand" "f")
> -  (match_operand:V2SF 2 "register_operand" "f")
> -  (const_int 2)))]
> + (vec_select:V2SF
> +   (vec_concat:V4SF
> + (match_operand:V2SF 1 "register_operand" "f")
> + (match_operand:V2SF 2 "register_operand" "f"))
> +   (parallel [(match_operand:SI 3 "const_0_or_1_operand" "")
> +  (match_operand:SI 4 "const_2_or_3_operand" "")])))]
>"TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> -  "pul.ps\t%0,%1,%2"
> +{
> +  static const int * const mnemonics[2][4] = {
> +/* LE */ { "pll.ps\t%0,%2,%1", "pul.ps\t%0,%2,%1",
> +"plu.ps\t%0,%2,%1", "puu.ps\t%0,%2,%1" },
> +/* BE */ { "puu.ps\t%0,%1,%2", "plu.ps\t%0,%1,%2",
> +"pul.ps\t%0,%1,%2", "pll.ps\t%0,%1,%2" },
> +  };
> +
> +  unsigned mask = INTVAL (operands[3]) * 2 + (INTVAL (operands[4]) - 2);
> +  return mnemonics[WORDS_BIG_ENDIAN][mask];
> +}

So I stared at this for fully an hour trying to work out all the
various orderings (vec_concat operands always in memory order,
parallel selector always in memory order, GCC vector element 0
being "upper" on big-endian and "lower" on little-endian,
P??.PS always specifying the upper part of the result first, etc.).
I ended up with:

  /* Let L be the lower part of operand  and U be the upper part.
 The P[UL][UL].PS instruction always specifies the upper part of the
 result first, so the instruction is:

P.PS %0,,

 where 0U ==  and 0L == .

 GCC's vector indices are specified in memory order, which means
 that vector element 0 is the lower part (L) on little-endian targets
 and the upper part (U) on big-endian targets.  vec_concat likewise
 concatenates in memory order, which means that operand 3 (being
 0 or 1) selects part of operand 1 and operand 4 (being 2 or 3)
 selects part of operand 2.

 Let:

I3 = INTVAL (operands[3])
I4 = INTVAL (operands[4]) - 2

 Taking the two endiannesses in turn:

 Little-endian:

The semantics of the RTL pattern are:

{ 0L, 0U } = { X[I3], X[I4 + 2] }, where X = { 1L, 1U, 2L, 2U }

so: 0L = { 1L, 1U }[I3] (= )
0U = { 2L, 2U }[I4] (= )

 = 2,  = I4 ? U : L
 = 1,  = I3 ? U : L

[LL] !I4 && !I3   [UL] I4 && !I3
[LU] !I4 && I3[UU] I4 && I3

 Big-endian:

The semantics of the RTL pattern are:

{ 0U, 0L } = { X[I3], X[I4 + 2] }, where X = { 1U, 1L, 2U, 2L }

so: 0U = { 1U, 1L }[I3] (= )
0L = { 2U, 2L }[I4] (= )

 = 1,  = I3 ? L : U
 = 2,  = I4 ? L : U

[UU] !I3 && !I4   [UL] !I3 && I4
[LU] I3 && !I4[LL] I3 && I4.  */

which suggests that the PUL and PLU entries for big-endian should be
the other way around.  Does that sound right, or have I misunderstood?

(Also, "const char *" rather than "const int *".)

The same confusion hit me with the expanders:

> +(define_expand "mips_pul_ps"
> +  [(match_operand:V2SF 0 "register_operand" "")
> +   (match_operand:V2SF 1 "register_operand" "")
> +   (match_operand:V2SF 2 "register_operand" "")]
> +  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> +{
> +  if (WORDS_BIG_ENDIAN)
> +emit_insn (gen_vec_perm_const_ps (operands[0], operands[1], operands[2],
> +   const0_rtx, const2_rtx));
> +  else
> +emit_insn (gen_vec_perm_const_ps (operands[0], operands[2], operands[1],
> +   const1_rtx, GEN_INT (3)));
> +  DONE;
> +})

This one looks like a pasto: the operands given here are the same
as for mips_puu_ps.  But...

> +(define_expand "mips_plu_ps"
> +  [(match_operand:V2SF 0 "register_operand" "")
> +   (match_operand:V2SF 1 "register_operand" "")
> +   (match_operand:V2SF 2 "register_operand" "")]
> +  "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
> +{
> +  if (WORDS_BIG_ENDIAN)
> +emit_insn (gen_vec_perm_const_ps (operands[0], operands[1], operands[2],
> +   const1_rtx, const2_rtx));
> +  else
> +emit_insn (gen_vec_perm_const_ps (operands[0], operands[2], operands[1],
> +   const0_rtx, GEN_INT (3)));
> +  DONE;
> +})

...for little-endian, we need to pass the "U" and "L" components of the
mnemonic in the reverse order: the MIPS instruction specifies the upper
part first, whereas the rtl pattern specifies the lower part first.
And for little-endian, U refers to memory element 1 and L to memory
element 0.  So I think this should be:

  if (WORDS_BIG_ENDIAN)
emit_insn (gen_vec_perm_const_ps (oper

Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-09 Thread Richard Henderson
On 12/08/2011 10:08 PM, Hans-Peter Nilsson wrote:
> On Thu, 8 Dec 2011, Richard Henderson wrote:
>> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>> index d3fd709..f1c3665 100644
>> --- a/gcc/config/mips/mips.c
>> +++ b/gcc/config/mips/mips.c
> 
>> @@ -13021,8 +13015,8 @@ static const struct mips_builtin_description 
>> mips_builtins[] = {
>>LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
>>LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
>>LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
>> -  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
>> -  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
>> +  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
>> +  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
>>LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
> 
> Looks like a brute-force (ignoring backward compatibility) fix
> for PR48068 item 2.  If going that route, I'd suggest at least
> increment the __mips_loongson_vector_rev.  Also, loongson.h
> needs the corresponding adjustment.

Thanks for the pointer.  I'll clean this up along the increment
revision line, unless Richard S has another suggestion.


r~


Re: [PATCH 5/6] mips: Implement vec_perm_const.

2011-12-08 Thread Hans-Peter Nilsson
On Thu, 8 Dec 2011, Richard Henderson wrote:
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index d3fd709..f1c3665 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c

> @@ -13021,8 +13015,8 @@ static const struct mips_builtin_description 
> mips_builtins[] = {
>LOONGSON_BUILTIN (pasubub, MIPS_UV8QI_FTYPE_UV8QI_UV8QI),
>LOONGSON_BUILTIN (biadd, MIPS_UV4HI_FTYPE_UV8QI),
>LOONGSON_BUILTIN (psadbh, MIPS_UV4HI_FTYPE_UV8QI_UV8QI),
> -  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UV4HI_UQI),
> -  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_V4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (pshufh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),
> +  LOONGSON_BUILTIN_SUFFIX (pshufh, s, MIPS_V4HI_FTYPE_V4HI_UQI),
>LOONGSON_BUILTIN_SUFFIX (psllh, u, MIPS_UV4HI_FTYPE_UV4HI_UQI),

Looks like a brute-force (ignoring backward compatibility) fix
for PR48068 item 2.  If going that route, I'd suggest at least
increment the __mips_loongson_vector_rev.  Also, loongson.h
needs the corresponding adjustment.

(No specific interest in Loongson, FWIW.)

brgds, H-P


[PATCH 5/6] mips: Implement vec_perm_const.

2011-12-08 Thread Richard Henderson
---
 gcc/config/mips/loongson.md|   24 +++-
 gcc/config/mips/mips-modes.def |1 +
 gcc/config/mips/mips-protos.h  |1 +
 gcc/config/mips/mips-ps-3d.md  |  145 ++
 gcc/config/mips/mips.c |  266 ++--
 gcc/config/mips/predicates.md  |7 +-
 6 files changed, 376 insertions(+), 68 deletions(-)

diff --git a/gcc/config/mips/loongson.md b/gcc/config/mips/loongson.md
index 225f4d1..23c37d7 100644
--- a/gcc/config/mips/loongson.md
+++ b/gcc/config/mips/loongson.md
@@ -403,12 +403,11 @@
 ;; Shuffle halfwords.
 (define_insn "loongson_pshufh"
   [(set (match_operand:VH 0 "register_operand" "=f")
-(unspec:VH [(match_operand:VH 1 "register_operand" "0")
-   (match_operand:VH 2 "register_operand" "f")
-   (match_operand:SI 3 "register_operand" "f")]
+(unspec:VH [(match_operand:VH 1 "register_operand" "f")
+   (match_operand:SI 2 "register_operand" "f")]
   UNSPEC_LOONGSON_PSHUFH))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-  "pshufh\t%0,%2,%3"
+  "pshufh\t%0,%1,%2"
   [(set_attr "type" "fmul")])
 
 ;; Shift left logical.
@@ -479,7 +478,7 @@
   [(set_attr "type" "fadd")])
 
 ;; Unpack high data.
-(define_insn "vec_interleave_high"
+(define_insn "loongson_punpckh"
   [(set (match_operand:VWHB 0 "register_operand" "=f")
 (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
  (match_operand:VWHB 2 "register_operand" "f")]
@@ -489,7 +488,7 @@
   [(set_attr "type" "fdiv")])
 
 ;; Unpack low data.
-(define_insn "vec_interleave_low"
+(define_insn "loongson_punpckl"
   [(set (match_operand:VWHB 0 "register_operand" "=f")
 (unspec:VWHB [(match_operand:VWHB 1 "register_operand" "f")
  (match_operand:VWHB 2 "register_operand" "f")]
@@ -498,6 +497,19 @@
   "punpckl\t%0,%1,%2"
   [(set_attr "type" "fdiv")])
 
+(define_expand "vec_perm_const"
+  [(match_operand:VWHB 0 "register_operand" "")
+   (match_operand:VWHB 1 "register_operand" "")
+   (match_operand:VWHB 2 "register_operand" "")
+   (match_operand:VWHB 3 "" "")]
+  "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
+{
+  if (mips_expand_vec_perm_const (operands))
+DONE;
+  else
+FAIL;
+})
+
 ;; Integer division and modulus.  For integer multiplication, see mips.md.
 
 (define_insn "div3"
diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def
index b9c508b..03b9632 100644
--- a/gcc/config/mips/mips-modes.def
+++ b/gcc/config/mips/mips-modes.def
@@ -29,6 +29,7 @@ FLOAT_MODE (TF, 16, mips_quad_format);
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI */
 VECTOR_MODES (FLOAT, 8);  /*V4HF V2SF */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
+VECTOR_MODES (FLOAT, 16);
 
 VECTOR_MODES (FRACT, 4);   /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);  /* V4UQQ V2UHQ */
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index dbabdff..37c958d 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -328,6 +328,7 @@ extern void mips_expand_atomic_qihi (union mips_gen_fn_ptrs,
 rtx, rtx, rtx, rtx);
 
 extern void mips_expand_vector_init (rtx, rtx);
+extern bool mips_expand_vec_perm_const (rtx op[4]);
 
 extern bool mips_eh_uses (unsigned int);
 extern bool mips_epilogue_uses (unsigned int);
diff --git a/gcc/config/mips/mips-ps-3d.md b/gcc/config/mips/mips-ps-3d.md
index 504f43c..d81abf8 100644
--- a/gcc/config/mips/mips-ps-3d.md
+++ b/gcc/config/mips/mips-ps-3d.md
@@ -89,61 +89,102 @@
   DONE;
 })
 
-; pul.ps - Pair Upper Lower
-(define_insn "mips_pul_ps"
+(define_insn "vec_perm_const_ps"
   [(set (match_operand:V2SF 0 "register_operand" "=f")
-   (vec_merge:V2SF
-(match_operand:V2SF 1 "register_operand" "f")
-(match_operand:V2SF 2 "register_operand" "f")
-(const_int 2)))]
+   (vec_select:V2SF
+ (vec_concat:V4SF
+   (match_operand:V2SF 1 "register_operand" "f")
+   (match_operand:V2SF 2 "register_operand" "f"))
+ (parallel [(match_operand:SI 3 "const_0_or_1_operand" "")
+(match_operand:SI 4 "const_2_or_3_operand" "")])))]
   "TARGET_HARD_FLOAT && TARGET_PAIRED_SINGLE_FLOAT"
-  "pul.ps\t%0,%1,%2"
+{
+  static const int * const mnemonics[2][4] = {
+/* LE */ { "pll.ps\t%0,%2,%1", "pul.ps\t%0,%2,%1",
+  "plu.ps\t%0,%2,%1", "puu.ps\t%0,%2,%1" },
+/* BE */ { "puu.ps\t%0,%1,%2", "plu.ps\t%0,%1,%2",
+  "pul.ps\t%0,%1,%2", "pll.ps\t%0,%1,%2" },
+  };
+
+  unsigned mask = INTVAL (operands[3]) * 2 + (INTVAL (operands[4]) - 2);
+  return mnemonics[WORDS_BIG_ENDIAN][mask];
+}
   [(set_attr "type" "fmove")
(set_attr "mode" "SF")])
 
-; puu.ps - Pair upper upper
-(define_insn "mips_puu_ps"
-  [(set (match_operand:V2SF 0 "register_operand" "=f")
-   (vec_merge:V2SF
-(match_operand:V2SF 1 "register_operand"