Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-11-15 Thread Christophe Lyon
On 15 November 2017 at 16:58, Kyrill  Tkachov
 wrote:
> Hi Christophe,
>
>
> On 15/11/17 15:31, Christophe Lyon wrote:
>>
>> Hi Kyrill,
>>
>>
>> On 8 November 2017 at 19:34, Kyrill  Tkachov
>>  wrote:
>>>
>>> On 06/06/17 14:17, James Greenhalgh wrote:

 On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:
>
> Hi all,
>
> On top of the previous vec_merge simplifications [1] we can add this
> pattern to perform
> a store of a vec_concat of two 64-bit values in distinct registers as
> an
> STP.
> This avoids constructing such a vector explicitly in a register and
> storing it as
> a Q register.
> This way for the code in the testcase we can generate:
>
> construct_lane_1:
>   ldp d1, d0, [x0]
>   fmovd3, 1.0e+0
>   fmovd2, 2.0e+0
>   faddd4, d1, d3
>   faddd5, d0, d2
>   stp d4, d5, [x1, 32]
>   ret
>
> construct_lane_2:
>   ldp x2, x0, [x0]
>   add x3, x2, 1
>   add x4, x0, 2
>   stp x3, x4, [x1, 32]
>   ret
>
> instead of the current:
> construct_lane_1:
>   ldp d0, d1, [x0]
>   fmovd3, 1.0e+0
>   fmovd2, 2.0e+0
>   faddd0, d0, d3
>   faddd1, d1, d2
>   dup v0.2d, v0.d[0]
>   ins v0.d[1], v1.d[0]
>   str q0, [x1, 32]
>   ret
>
> construct_lane_2:
>   ldp x2, x3, [x0]
>   add x0, x2, 1
>   add x2, x3, 2
>   dup v0.2d, x0
>   ins v0.d[1], x2
>   str q0, [x1, 32]
>   ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for GCC 8?

 OK.
>>>
>>>
>>> Thanks, I've committed this and the other patches in this series after
>>> rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
>>> The only conflict from updating the patch was that I had to use the
>>> store_16
>>> attribute rather than
>>> the old store2 for the new define_insn. This is what I've committed with
>>> r254551.
>>>
>>> Sorry for the delay in committing.
>>>
>> I've noticed that the new tests fail when testing with -mabi=ilp32:
>> FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
>> FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
>> stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
>> FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
>> stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)
>>
>> Sorry if this has been reported before.
>
>
> Thank you for reporting this, I was not aware of it.
> My patch does indeed fail to generate the optimised sequence for
> -mabi=ilp32.
> During combine it fails to match:
> Failed to match this instruction:
> (set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ])
> (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128])
> (vec_concat:V2DF (reg:DF 81 [ y0 ])
> (reg:DF 84 [ y1 ])))
>
>
> but without the -mabi=ilp32 it does successfully match the equivalent
>
> (set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ])
> (const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 A128])
> (vec_concat:V2DF (reg:DF 81 [ y0 ])
> (reg:DF 84 [ y1 ])))
>
> The only difference is the index register being the hard reg x1.
> There's probably some subtlety in aarch64_classify_address that I'll need to
> dig into.
> In any case, can you please open a bug report for this so we can track it?

Sure, that's: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83009


> To be clear, the failure is just suboptimal codegen for the -mabi=ilp32
> case, not a wrong-code or ICE
> (though it should still be fixed, of course).
>
> Thanks again,
> Kyrill
>
>
>> Christophe
>>
>>> Kyrill
>>>
>>>
 Thanks,
 James

> 2017-06-06  Kyrylo Tkachov  
>
>   * config/aarch64/aarch64-simd.md (store_pair_lanes):
>   New pattern.
>   * config/aarch64/constraints.md (Uml): New constraint.
>   * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand):
> New
>   predicate.
>
> 2017-06-06  Kyrylo Tkachov  
>
>   * gcc.target/aarch64/store_v2vec_lanes.c: New test.


>


Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-11-15 Thread Kyrill Tkachov

Hi Christophe,

On 15/11/17 15:31, Christophe Lyon wrote:

Hi Kyrill,


On 8 November 2017 at 19:34, Kyrill  Tkachov
 wrote:

On 06/06/17 14:17, James Greenhalgh wrote:

On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:

Hi all,

On top of the previous vec_merge simplifications [1] we can add this
pattern to perform
a store of a vec_concat of two 64-bit values in distinct registers as an
STP.
This avoids constructing such a vector explicitly in a register and
storing it as
a Q register.
This way for the code in the testcase we can generate:

construct_lane_1:
  ldp d1, d0, [x0]
  fmovd3, 1.0e+0
  fmovd2, 2.0e+0
  faddd4, d1, d3
  faddd5, d0, d2
  stp d4, d5, [x1, 32]
  ret

construct_lane_2:
  ldp x2, x0, [x0]
  add x3, x2, 1
  add x4, x0, 2
  stp x3, x4, [x1, 32]
  ret

instead of the current:
construct_lane_1:
  ldp d0, d1, [x0]
  fmovd3, 1.0e+0
  fmovd2, 2.0e+0
  faddd0, d0, d3
  faddd1, d1, d2
  dup v0.2d, v0.d[0]
  ins v0.d[1], v1.d[0]
  str q0, [x1, 32]
  ret

construct_lane_2:
  ldp x2, x3, [x0]
  add x0, x2, 1
  add x2, x3, 2
  dup v0.2d, x0
  ins v0.d[1], x2
  str q0, [x1, 32]
  ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

OK.


Thanks, I've committed this and the other patches in this series after
rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
The only conflict from updating the patch was that I had to use the store_16
attribute rather than
the old store2 for the new define_insn. This is what I've committed with
r254551.

Sorry for the delay in committing.


I've noticed that the new tests fail when testing with -mabi=ilp32:
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)

Sorry if this has been reported before.


Thank you for reporting this, I was not aware of it.
My patch does indeed fail to generate the optimised sequence for 
-mabi=ilp32.

During combine it fails to match:
Failed to match this instruction:
(set (mem:V2DF (plus:DI (reg/v/f:DI 79 [ z ])
(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 
A128])

(vec_concat:V2DF (reg:DF 81 [ y0 ])
(reg:DF 84 [ y1 ])))


but without the -mabi=ilp32 it does successfully match the equivalent

(set (mem:V2DF (plus:DI (reg:DI 1 x1 [ z ])
(const_int 32 [0x20])) [1 MEM[(v2df *)z_8(D) + 32B]+0 S16 
A128])

(vec_concat:V2DF (reg:DF 81 [ y0 ])
(reg:DF 84 [ y1 ])))

The only difference is the index register being the hard reg x1.
There's probably some subtlety in aarch64_classify_address that I'll 
need to dig into.

In any case, can you please open a bug report for this so we can track it?
To be clear, the failure is just suboptimal codegen for the -mabi=ilp32 
case, not a wrong-code or ICE

(though it should still be fixed, of course).

Thanks again,
Kyrill


Christophe


Kyrill



Thanks,
James


2017-06-06  Kyrylo Tkachov  

  * config/aarch64/aarch64-simd.md (store_pair_lanes):
  New pattern.
  * config/aarch64/constraints.md (Uml): New constraint.
  * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
  predicate.

2017-06-06  Kyrylo Tkachov  

  * gcc.target/aarch64/store_v2vec_lanes.c: New test.






Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-11-15 Thread Christophe Lyon
Hi Kyrill,


On 8 November 2017 at 19:34, Kyrill  Tkachov
 wrote:
>
> On 06/06/17 14:17, James Greenhalgh wrote:
>>
>> On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:
>>>
>>> Hi all,
>>>
>>> On top of the previous vec_merge simplifications [1] we can add this
>>> pattern to perform
>>> a store of a vec_concat of two 64-bit values in distinct registers as an
>>> STP.
>>> This avoids constructing such a vector explicitly in a register and
>>> storing it as
>>> a Q register.
>>> This way for the code in the testcase we can generate:
>>>
>>> construct_lane_1:
>>>  ldp d1, d0, [x0]
>>>  fmovd3, 1.0e+0
>>>  fmovd2, 2.0e+0
>>>  faddd4, d1, d3
>>>  faddd5, d0, d2
>>>  stp d4, d5, [x1, 32]
>>>  ret
>>>
>>> construct_lane_2:
>>>  ldp x2, x0, [x0]
>>>  add x3, x2, 1
>>>  add x4, x0, 2
>>>  stp x3, x4, [x1, 32]
>>>  ret
>>>
>>> instead of the current:
>>> construct_lane_1:
>>>  ldp d0, d1, [x0]
>>>  fmovd3, 1.0e+0
>>>  fmovd2, 2.0e+0
>>>  faddd0, d0, d3
>>>  faddd1, d1, d2
>>>  dup v0.2d, v0.d[0]
>>>  ins v0.d[1], v1.d[0]
>>>  str q0, [x1, 32]
>>>  ret
>>>
>>> construct_lane_2:
>>>  ldp x2, x3, [x0]
>>>  add x0, x2, 1
>>>  add x2, x3, 2
>>>  dup v0.2d, x0
>>>  ins v0.d[1], x2
>>>  str q0, [x1, 32]
>>>  ret
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Ok for GCC 8?
>>
>> OK.
>
>
> Thanks, I've committed this and the other patches in this series after
> rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
> The only conflict from updating the patch was that I had to use the store_16
> attribute rather than
> the old store2 for the new define_insn. This is what I've committed with
> r254551.
>
> Sorry for the delay in committing.
>

I've noticed that the new tests fail when testing with -mabi=ilp32:
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-not ins\t
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\td[0-9]+, d[0-9]+ 1 (found 0 times)
FAIL:gcc.target/aarch64/store_v2vec_lanes.c scan-assembler-times
stp\tx[0-9]+, x[0-9]+ 1 (found 0 times)

Sorry if this has been reported before.

Christophe

> Kyrill
>
>
>> Thanks,
>> James
>>
>>> 2017-06-06  Kyrylo Tkachov  
>>>
>>>  * config/aarch64/aarch64-simd.md (store_pair_lanes):
>>>  New pattern.
>>>  * config/aarch64/constraints.md (Uml): New constraint.
>>>  * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
>>>  predicate.
>>>
>>> 2017-06-06  Kyrylo Tkachov  
>>>
>>>  * gcc.target/aarch64/store_v2vec_lanes.c: New test.
>>
>>
>


Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-11-08 Thread Kyrill Tkachov


On 06/06/17 14:17, James Greenhalgh wrote:

On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:

Hi all,

On top of the previous vec_merge simplifications [1] we can add this pattern to 
perform
a store of a vec_concat of two 64-bit values in distinct registers as an STP.
This avoids constructing such a vector explicitly in a register and storing it 
as
a Q register.
This way for the code in the testcase we can generate:

construct_lane_1:
 ldp d1, d0, [x0]
 fmovd3, 1.0e+0
 fmovd2, 2.0e+0
 faddd4, d1, d3
 faddd5, d0, d2
 stp d4, d5, [x1, 32]
 ret

construct_lane_2:
 ldp x2, x0, [x0]
 add x3, x2, 1
 add x4, x0, 2
 stp x3, x4, [x1, 32]
 ret

instead of the current:
construct_lane_1:
 ldp d0, d1, [x0]
 fmovd3, 1.0e+0
 fmovd2, 2.0e+0
 faddd0, d0, d3
 faddd1, d1, d2
 dup v0.2d, v0.d[0]
 ins v0.d[1], v1.d[0]
 str q0, [x1, 32]
 ret

construct_lane_2:
 ldp x2, x3, [x0]
 add x0, x2, 1
 add x2, x3, 2
 dup v0.2d, x0
 ins v0.d[1], x2
 str q0, [x1, 32]
 ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

OK.


Thanks, I've committed this and the other patches in this series after 
rebasing and rebootstrapping and testing on aarch64-none-linux-gnu.
The only conflict from updating the patch was that I had to use the 
store_16 attribute rather than
the old store2 for the new define_insn. This is what I've committed with 
r254551.


Sorry for the delay in committing.

Kyrill


Thanks,
James


2017-06-06  Kyrylo Tkachov  

 * config/aarch64/aarch64-simd.md (store_pair_lanes):
 New pattern.
 * config/aarch64/constraints.md (Uml): New constraint.
 * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
 predicate.

2017-06-06  Kyrylo Tkachov  

 * gcc.target/aarch64/store_v2vec_lanes.c: New test.




diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b78affe9b06ffc97322a4fcf1ec8e80ecdf6..0847e7aff230782874565e565929c10053807839 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2817,6 +2817,18 @@ (define_insn "load_pair_lanes"
   [(set_attr "type" "neon_load1_1reg_q")]
 )
 
+(define_insn "store_pair_lanes"
+  [(set (match_operand: 0 "aarch64_mem_pair_lanes_operand" "=Uml, Uml")
+	(vec_concat:
+	   (match_operand:VDC 1 "register_operand" "w, r")
+	   (match_operand:VDC 2 "register_operand" "w, r")))]
+  "TARGET_SIMD"
+  "@
+   stp\\t%d1, %d2, %0
+   stp\\t%x1, %x2, %0"
+  [(set_attr "type" "neon_stp, store_16")]
+)
+
 ;; In this insn, operand 1 should be low, and operand 2 the high part of the
 ;; dest vector.
 
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index ab607b9f7488e903a14fe93e88d4c4e1fad762b3..6bdb93f07a7c8d19675971ccfc71e681fd2dae66 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -154,6 +154,15 @@ (define_memory_constraint "Ump"
(match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 		  PARALLEL, 1)")))
 
+;; Used for storing two 64-bit values in an AdvSIMD register using an STP
+;; as a 128-bit vec_concat.
+(define_memory_constraint "Uml"
+  "@internal
+  A memory address suitable for a load/store pair operation."
+  (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0),
+		   PARALLEL, 1)")))
+
 (define_memory_constraint "Utv"
   "@internal
An address valid for loading/storing opaque structure
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 16e864765cde3bf8f54a11e5fc8db5a53606db30..c175f9d9a5f969817782bede858b1834ac642e28 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -173,6 +173,13 @@ (define_predicate "aarch64_mem_pair_operand"
(match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
 	   0)")))
 
+;; Used for storing two 64-bit values in an AdvSIMD register using an STP
+;; as a 128-bit vec_concat.
+(define_predicate "aarch64_mem_pair_lanes_operand"
+  (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0),
+		   PARALLEL, 1)")))
+
 (define_predicate "aarch64_prefetch_operand"
   (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
 
diff --git a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
new file mode 100644
index ..6810db3c54dce81777caf062177facedb464d1d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */

Re: [PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-06-06 Thread James Greenhalgh
On Tue, Jun 06, 2017 at 09:40:44AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> On top of the previous vec_merge simplifications [1] we can add this pattern 
> to perform
> a store of a vec_concat of two 64-bit values in distinct registers as an STP.
> This avoids constructing such a vector explicitly in a register and storing 
> it as
> a Q register.
> This way for the code in the testcase we can generate:
> 
> construct_lane_1:
> ldp d1, d0, [x0]
> fmovd3, 1.0e+0
> fmovd2, 2.0e+0
> faddd4, d1, d3
> faddd5, d0, d2
> stp d4, d5, [x1, 32]
> ret
> 
> construct_lane_2:
> ldp x2, x0, [x0]
> add x3, x2, 1
> add x4, x0, 2
> stp x3, x4, [x1, 32]
> ret
> 
> instead of the current:
> construct_lane_1:
> ldp d0, d1, [x0]
> fmovd3, 1.0e+0
> fmovd2, 2.0e+0
> faddd0, d0, d3
> faddd1, d1, d2
> dup v0.2d, v0.d[0]
> ins v0.d[1], v1.d[0]
> str q0, [x1, 32]
> ret
> 
> construct_lane_2:
> ldp x2, x3, [x0]
> add x0, x2, 1
> add x2, x3, 2
> dup v0.2d, x0
> ins v0.d[1], x2
> str q0, [x1, 32]
> ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for GCC 8?

OK.

Thanks,
James

> 2017-06-06  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64-simd.md (store_pair_lanes):
> New pattern.
> * config/aarch64/constraints.md (Uml): New constraint.
> * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
> predicate.
> 
> 2017-06-06  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/store_v2vec_lanes.c: New test.




[PATCH][AArch64] Add STP pattern to store a vec_concat of two 64-bit registers

2017-06-06 Thread Kyrill Tkachov

Hi all,

On top of the previous vec_merge simplifications [1] we can add this pattern to 
perform
a store of a vec_concat of two 64-bit values in distinct registers as an STP.
This avoids constructing such a vector explicitly in a register and storing it 
as
a Q register.
This way for the code in the testcase we can generate:

construct_lane_1:
ldp d1, d0, [x0]
fmovd3, 1.0e+0
fmovd2, 2.0e+0
faddd4, d1, d3
faddd5, d0, d2
stp d4, d5, [x1, 32]
ret

construct_lane_2:
ldp x2, x0, [x0]
add x3, x2, 1
add x4, x0, 2
stp x3, x4, [x1, 32]
ret

instead of the current:
construct_lane_1:
ldp d0, d1, [x0]
fmovd3, 1.0e+0
fmovd2, 2.0e+0
faddd0, d0, d3
faddd1, d1, d2
dup v0.2d, v0.d[0]
ins v0.d[1], v1.d[0]
str q0, [x1, 32]
ret

construct_lane_2:
ldp x2, x3, [x0]
add x0, x2, 1
add x2, x3, 2
dup v0.2d, x0
ins v0.d[1], x2
str q0, [x1, 32]
ret

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for GCC 8?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00273.html
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00274.html

2017-06-06  Kyrylo Tkachov  

* config/aarch64/aarch64-simd.md (store_pair_lanes):
New pattern.
* config/aarch64/constraints.md (Uml): New constraint.
* config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): New
predicate.

2017-06-06  Kyrylo Tkachov  

* gcc.target/aarch64/store_v2vec_lanes.c: New test.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b78affe9b06ffc97322a4fcf1ec8e80ecdf6..0847e7aff230782874565e565929c10053807839 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2817,6 +2817,18 @@ (define_insn "load_pair_lanes"
   [(set_attr "type" "neon_load1_1reg_q")]
 )
 
+(define_insn "store_pair_lanes"
+  [(set (match_operand: 0 "aarch64_mem_pair_lanes_operand" "=Uml, Uml")
+	(vec_concat:
+	   (match_operand:VDC 1 "register_operand" "w, r")
+	   (match_operand:VDC 2 "register_operand" "w, r")))]
+  "TARGET_SIMD"
+  "@
+   stp\\t%d1, %d2, %0
+   stp\\t%x1, %x2, %0"
+  [(set_attr "type" "neon_stp, store2")]
+)
+
 ;; In this insn, operand 1 should be low, and operand 2 the high part of the
 ;; dest vector.
 
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index ab607b9f7488e903a14fe93e88d4c4e1fad762b3..6bdb93f07a7c8d19675971ccfc71e681fd2dae66 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -154,6 +154,15 @@ (define_memory_constraint "Ump"
(match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 		  PARALLEL, 1)")))
 
+;; Used for storing two 64-bit values in an AdvSIMD register using an STP
+;; as a 128-bit vec_concat.
+(define_memory_constraint "Uml"
+  "@internal
+  A memory address suitable for a load/store pair operation."
+  (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0),
+		   PARALLEL, 1)")))
+
 (define_memory_constraint "Utv"
   "@internal
An address valid for loading/storing opaque structure
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 16e864765cde3bf8f54a11e5fc8db5a53606db30..c175f9d9a5f969817782bede858b1834ac642e28 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -173,6 +173,13 @@ (define_predicate "aarch64_mem_pair_operand"
(match_test "aarch64_legitimate_address_p (mode, XEXP (op, 0), PARALLEL,
 	   0)")))
 
+;; Used for storing two 64-bit values in an AdvSIMD register using an STP
+;; as a 128-bit vec_concat.
+(define_predicate "aarch64_mem_pair_lanes_operand"
+  (and (match_code "mem")
+   (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0),
+		   PARALLEL, 1)")))
+
 (define_predicate "aarch64_prefetch_operand"
   (match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
 
diff --git a/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
new file mode 100644
index ..6810db3c54dce81777caf062177facedb464d1d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/store_v2vec_lanes.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+void
+construct_lane_1 (double *y, v2df *z)
+{
+  double y0 = y[0] + 1;
+  double y1 = y[1] + 2;
+  v2df x = {y0, y1};
+  z[2] = x;
+}
+
+void
+construct_lane_2 (long long *y, v2di *z)
+{
+