Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-05-03 Thread Alex Coplan
On 22/04/2024 13:01, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 14/04/24 10:29 pm, Ajit Agarwal wrote:
> > Hello Alex:
> > 
> > On 12/04/24 11:02 pm, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 12/04/24 8:15 pm, Alex Coplan wrote:
> >>> On 12/04/2024 20:02, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 10/04/24 1:42 pm, Alex Coplan wrote:
> > Hi Ajit,
> >
> > On 09/04/2024 20:59, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>> On 09/04/2024 20:01, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>  Hello Alex/Richard:
> 
>  All review comments are incorporated.
> >>> 
>  @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>  _list,
>   // of accesses.  If we find two sets of adjacent accesses, 
>  call
>   // merge_pairs.
>   void
>  -ldp_bb_info::transform_for_base (int encoded_lfs,
>  - access_group )
>  +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>  + access_group )
>   {
> const auto lfs = decode_lfs (encoded_lfs);
> const unsigned access_size = lfs.size;
>  @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>  encoded_lfs,
>  access.cand_insns,
>  lfs.load_p,
>  access_size);
>  -  skip_next = access.cand_insns.empty ();
>  +  skip_next = bb_state->cand_insns_empty_p 
>  (access.cand_insns);
> >>>
> >>> As above, why is this needed?
> >>
> >> For rs6000 we want to return always true. as load store pair
> >> that are to be merged with 8/16 16/32 32/64 is occuring for 
> >> rs6000.
> >> And we want load store pair to 8/16 32/64. Thats why we want
> >> to generate always true for rs6000 to skip pairs as above.
> >
> > Hmm, sorry, I'm not sure I follow.  Are you saying that for 
> > rs6000 you have
> > load/store pair instructions where the two arms of the access 
> > are storing
> > operands of different sizes?  Or something else?
> >
> > As it stands the logic is to skip the next iteration only if we
> > exhausted all the candidate insns for the current access.  In 
> > the case
> > that we didn't exhaust all such candidates, then the idea is 
> > that when
> > access becomes prev_access, we can attempt to use those 
> > candidates as
> > the "left-hand side" of a pair in the next iteration since we 
> > failed to
> > use them as the "right-hand side" of a pair in the current 
> > iteration.
> > I don't see why you wouldn't want that behaviour.  Please can 
> > you
> > explain?
> >
> 
>  In merge_pair we get the 2 load candiates one load from 0 offset 
>  and
>  other load is from 16th offset. Then in next iteration we get 
>  load
>  from 16th offset and other load from 32 offset. In next iteration
>  we get load from 32 offset and other load from 48 offset.
> 
>  For example:
> 
>  Currently we get the load candiates as follows.
> 
>  pairs:
> 
>  load from 0th offset.
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 16th offset.
>  load from 32th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> 
>  Instead in rs6000 we should get:
> 
>  pairs:
> 
>  load from 0th offset
>  load from 16th 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-22 Thread Ajit Agarwal
Hello Alex:

On 14/04/24 10:29 pm, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 12/04/24 11:02 pm, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 12/04/24 8:15 pm, Alex Coplan wrote:
>>> On 12/04/2024 20:02, Ajit Agarwal wrote:
 Hello Alex:

 On 11/04/24 7:55 pm, Alex Coplan wrote:
> On 10/04/2024 23:48, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 7:52 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 10/04/2024 15:31, Ajit Agarwal wrote:
 Hello Alex:

 On 10/04/24 1:42 pm, Alex Coplan wrote:
> Hi Ajit,
>
> On 09/04/2024 20:59, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 8:39 pm, Alex Coplan wrote:
>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 7:29 pm, Alex Coplan wrote:
> On 09/04/2024 17:30, Ajit Agarwal wrote:
>>
>>
>> On 05/04/24 10:03 pm, Alex Coplan wrote:
>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are incorporated.
>>> 
 @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
 _list,
  // of accesses.  If we find two sets of adjacent accesses, 
 call
  // merge_pairs.
  void
 -ldp_bb_info::transform_for_base (int encoded_lfs,
 -   access_group )
 +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
 +   access_group )
  {
const auto lfs = decode_lfs (encoded_lfs);
const unsigned access_size = lfs.size;
 @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
 encoded_lfs,
   access.cand_insns,
   lfs.load_p,
   access_size);
 -skip_next = access.cand_insns.empty ();
 +skip_next = bb_state->cand_insns_empty_p 
 (access.cand_insns);
>>>
>>> As above, why is this needed?
>>
>> For rs6000 we want to return always true. as load store pair
>> that are to be merged with 8/16 16/32 32/64 is occuring for 
>> rs6000.
>> And we want load store pair to 8/16 32/64. Thats why we want
>> to generate always true for rs6000 to skip pairs as above.
>
> Hmm, sorry, I'm not sure I follow.  Are you saying that for 
> rs6000 you have
> load/store pair instructions where the two arms of the access are 
> storing
> operands of different sizes?  Or something else?
>
> As it stands the logic is to skip the next iteration only if we
> exhausted all the candidate insns for the current access.  In the 
> case
> that we didn't exhaust all such candidates, then the idea is that 
> when
> access becomes prev_access, we can attempt to use those 
> candidates as
> the "left-hand side" of a pair in the next iteration since we 
> failed to
> use them as the "right-hand side" of a pair in the current 
> iteration.
> I don't see why you wouldn't want that behaviour.  Please can you
> explain?
>

 In merge_pair we get the 2 load candiates one load from 0 offset 
 and
 other load is from 16th offset. Then in next iteration we get load
 from 16th offset and other load from 32 offset. In next iteration
 we get load from 32 offset and other load from 48 offset.

 For example:

 Currently we get the load candiates as follows.

 pairs:

 load from 0th offset.
 load from 16th offset.

 next pairs:

 load from 16th offset.
 load from 32th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.

 Instead in rs6000 we should get:

 pairs:

 load from 0th offset
 load from 16th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.
>>>
>>> Hmm, so then I guess my question is: why wouldn't you consider 
>>> merging
>>> the pair with offsets (16,32) for rs6000?  Is it because you 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-14 Thread Ajit Agarwal
Hello Alex:

On 12/04/24 11:02 pm, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 12/04/24 8:15 pm, Alex Coplan wrote:
>> On 12/04/2024 20:02, Ajit Agarwal wrote:
>>> Hello Alex:
>>>
>>> On 11/04/24 7:55 pm, Alex Coplan wrote:
 On 10/04/2024 23:48, Ajit Agarwal wrote:
> Hello Alex:
>
> On 10/04/24 7:52 pm, Alex Coplan wrote:
>> Hi Ajit,
>>
>> On 10/04/2024 15:31, Ajit Agarwal wrote:
>>> Hello Alex:
>>>
>>> On 10/04/24 1:42 pm, Alex Coplan wrote:
 Hi Ajit,

 On 09/04/2024 20:59, Ajit Agarwal wrote:
> Hello Alex:
>
> On 09/04/24 8:39 pm, Alex Coplan wrote:
>> On 09/04/2024 20:01, Ajit Agarwal wrote:
>>> Hello Alex:
>>>
>>> On 09/04/24 7:29 pm, Alex Coplan wrote:
 On 09/04/2024 17:30, Ajit Agarwal wrote:
>
>
> On 05/04/24 10:03 pm, Alex Coplan wrote:
>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>>> Hello Alex/Richard:
>>>
>>> All review comments are incorporated.
>> 
>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>>> _list,
>>>  // of accesses.  If we find two sets of adjacent accesses, call
>>>  // merge_pairs.
>>>  void
>>> -ldp_bb_info::transform_for_base (int encoded_lfs,
>>> -access_group )
>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>>> +access_group )
>>>  {
>>>const auto lfs = decode_lfs (encoded_lfs);
>>>const unsigned access_size = lfs.size;
>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>>> encoded_lfs,
>>>access.cand_insns,
>>>lfs.load_p,
>>>access_size);
>>> - skip_next = access.cand_insns.empty ();
>>> + skip_next = bb_state->cand_insns_empty_p 
>>> (access.cand_insns);
>>
>> As above, why is this needed?
>
> For rs6000 we want to return always true. as load store pair
> that are to be merged with 8/16 16/32 32/64 is occuring for 
> rs6000.
> And we want load store pair to 8/16 32/64. Thats why we want
> to generate always true for rs6000 to skip pairs as above.

 Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
 you have
 load/store pair instructions where the two arms of the access are 
 storing
 operands of different sizes?  Or something else?

 As it stands the logic is to skip the next iteration only if we
 exhausted all the candidate insns for the current access.  In the 
 case
 that we didn't exhaust all such candidates, then the idea is that 
 when
 access becomes prev_access, we can attempt to use those candidates 
 as
 the "left-hand side" of a pair in the next iteration since we 
 failed to
 use them as the "right-hand side" of a pair in the current 
 iteration.
 I don't see why you wouldn't want that behaviour.  Please can you
 explain?

>>>
>>> In merge_pair we get the 2 load candiates one load from 0 offset and
>>> other load is from 16th offset. Then in next iteration we get load
>>> from 16th offset and other load from 32 offset. In next iteration
>>> we get load from 32 offset and other load from 48 offset.
>>>
>>> For example:
>>>
>>> Currently we get the load candiates as follows.
>>>
>>> pairs:
>>>
>>> load from 0th offset.
>>> load from 16th offset.
>>>
>>> next pairs:
>>>
>>> load from 16th offset.
>>> load from 32th offset.
>>>
>>> next pairs:
>>>
>>> load from 32th offset
>>> load from 48th offset.
>>>
>>> Instead in rs6000 we should get:
>>>
>>> pairs:
>>>
>>> load from 0th offset
>>> load from 16th offset.
>>>
>>> next pairs:
>>>
>>> load from 32th offset
>>> load from 48th offset.
>>
>> Hmm, so then I guess my question is: why wouldn't you consider 
>> merging
>> the pair with offsets (16,32) for rs6000?  Is it because you have a
>> stricter alignment requirement on the base pair offsets (i.e. they 
>> have
>> to be a multiple of 32 when the operand size is 16)?  So the pair
>> offsets have to be a 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Ajit Agarwal
Hello Alex:

On 12/04/24 8:15 pm, Alex Coplan wrote:
> On 12/04/2024 20:02, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 11/04/24 7:55 pm, Alex Coplan wrote:
>>> On 10/04/2024 23:48, Ajit Agarwal wrote:
 Hello Alex:

 On 10/04/24 7:52 pm, Alex Coplan wrote:
> Hi Ajit,
>
> On 10/04/2024 15:31, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 1:42 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 8:39 pm, Alex Coplan wrote:
> On 09/04/2024 20:01, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 7:29 pm, Alex Coplan wrote:
>>> On 09/04/2024 17:30, Ajit Agarwal wrote:


 On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
> 
>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>> _list,
>>  // of accesses.  If we find two sets of adjacent accesses, call
>>  // merge_pairs.
>>  void
>> -ldp_bb_info::transform_for_base (int encoded_lfs,
>> - access_group )
>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>> + access_group )
>>  {
>>const auto lfs = decode_lfs (encoded_lfs);
>>const unsigned access_size = lfs.size;
>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>> encoded_lfs,
>> access.cand_insns,
>> lfs.load_p,
>> access_size);
>> -  skip_next = access.cand_insns.empty ();
>> +  skip_next = bb_state->cand_insns_empty_p 
>> (access.cand_insns);
>
> As above, why is this needed?

 For rs6000 we want to return always true. as load store pair
 that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
 And we want load store pair to 8/16 32/64. Thats why we want
 to generate always true for rs6000 to skip pairs as above.
>>>
>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
>>> you have
>>> load/store pair instructions where the two arms of the access are 
>>> storing
>>> operands of different sizes?  Or something else?
>>>
>>> As it stands the logic is to skip the next iteration only if we
>>> exhausted all the candidate insns for the current access.  In the 
>>> case
>>> that we didn't exhaust all such candidates, then the idea is that 
>>> when
>>> access becomes prev_access, we can attempt to use those candidates 
>>> as
>>> the "left-hand side" of a pair in the next iteration since we 
>>> failed to
>>> use them as the "right-hand side" of a pair in the current 
>>> iteration.
>>> I don't see why you wouldn't want that behaviour.  Please can you
>>> explain?
>>>
>>
>> In merge_pair we get the 2 load candiates one load from 0 offset and
>> other load is from 16th offset. Then in next iteration we get load
>> from 16th offset and other load from 32 offset. In next iteration
>> we get load from 32 offset and other load from 48 offset.
>>
>> For example:
>>
>> Currently we get the load candiates as follows.
>>
>> pairs:
>>
>> load from 0th offset.
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 16th offset.
>> load from 32th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>>
>> Instead in rs6000 we should get:
>>
>> pairs:
>>
>> load from 0th offset
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>
> Hmm, so then I guess my question is: why wouldn't you consider merging
> the pair with offsets (16,32) for rs6000?  Is it because you have a
> stricter alignment requirement on the base pair offsets (i.e. they 
> have
> to be a multiple of 32 when the operand size is 16)?  So the pair
> offsets have to be a multiple of the entire pair size rather than a
> single operand size> 

 We get load pair at a certain point with (0,16) and other program
 point we get load pair (32, 48).

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Alex Coplan
On 12/04/2024 20:02, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 11/04/24 7:55 pm, Alex Coplan wrote:
> > On 10/04/2024 23:48, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 7:52 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 10/04/2024 15:31, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 10/04/24 1:42 pm, Alex Coplan wrote:
> > Hi Ajit,
> >
> > On 09/04/2024 20:59, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>> On 09/04/2024 20:01, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>  Hello Alex/Richard:
> 
>  All review comments are incorporated.
> >>> 
>  @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>  _list,
>   // of accesses.  If we find two sets of adjacent accesses, call
>   // merge_pairs.
>   void
>  -ldp_bb_info::transform_for_base (int encoded_lfs,
>  - access_group )
>  +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>  + access_group )
>   {
> const auto lfs = decode_lfs (encoded_lfs);
> const unsigned access_size = lfs.size;
>  @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>  encoded_lfs,
>  access.cand_insns,
>  lfs.load_p,
>  access_size);
>  -  skip_next = access.cand_insns.empty ();
>  +  skip_next = bb_state->cand_insns_empty_p 
>  (access.cand_insns);
> >>>
> >>> As above, why is this needed?
> >>
> >> For rs6000 we want to return always true. as load store pair
> >> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >> And we want load store pair to 8/16 32/64. Thats why we want
> >> to generate always true for rs6000 to skip pairs as above.
> >
> > Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> > you have
> > load/store pair instructions where the two arms of the access are 
> > storing
> > operands of different sizes?  Or something else?
> >
> > As it stands the logic is to skip the next iteration only if we
> > exhausted all the candidate insns for the current access.  In the 
> > case
> > that we didn't exhaust all such candidates, then the idea is that 
> > when
> > access becomes prev_access, we can attempt to use those candidates 
> > as
> > the "left-hand side" of a pair in the next iteration since we 
> > failed to
> > use them as the "right-hand side" of a pair in the current 
> > iteration.
> > I don't see why you wouldn't want that behaviour.  Please can you
> > explain?
> >
> 
>  In merge_pair we get the 2 load candiates one load from 0 offset and
>  other load is from 16th offset. Then in next iteration we get load
>  from 16th offset and other load from 32 offset. In next iteration
>  we get load from 32 offset and other load from 48 offset.
> 
>  For example:
> 
>  Currently we get the load candiates as follows.
> 
>  pairs:
> 
>  load from 0th offset.
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 16th offset.
>  load from 32th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> 
>  Instead in rs6000 we should get:
> 
>  pairs:
> 
>  load from 0th offset
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> >>>
> >>> Hmm, so then I guess my question is: why wouldn't you consider merging
> >>> the pair with offsets (16,32) for rs6000?  Is it because you have a
> >>> stricter alignment requirement on the base pair offsets (i.e. they 
> >>> have
> >>> to be a multiple of 32 when the operand size is 16)?  So the pair
> >>> offsets have to be a multiple of the entire pair size rather than a
> >>> single operand size> 
> >>
> >> We get load pair at a certain point with (0,16) and other program
> >> point we get load pair (32, 48).
> >>
> >> In current implementation it takes 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-12 Thread Ajit Agarwal
Hello Alex:

On 11/04/24 7:55 pm, Alex Coplan wrote:
> On 10/04/2024 23:48, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 7:52 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 10/04/2024 15:31, Ajit Agarwal wrote:
 Hello Alex:

 On 10/04/24 1:42 pm, Alex Coplan wrote:
> Hi Ajit,
>
> On 09/04/2024 20:59, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 8:39 pm, Alex Coplan wrote:
>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 7:29 pm, Alex Coplan wrote:
> On 09/04/2024 17:30, Ajit Agarwal wrote:
>>
>>
>> On 05/04/24 10:03 pm, Alex Coplan wrote:
>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are incorporated.
>>> 
 @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
 _list,
  // of accesses.  If we find two sets of adjacent accesses, call
  // merge_pairs.
  void
 -ldp_bb_info::transform_for_base (int encoded_lfs,
 -   access_group )
 +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
 +   access_group )
  {
const auto lfs = decode_lfs (encoded_lfs);
const unsigned access_size = lfs.size;
 @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
 encoded_lfs,
   access.cand_insns,
   lfs.load_p,
   access_size);
 -skip_next = access.cand_insns.empty ();
 +skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
>>>
>>> As above, why is this needed?
>>
>> For rs6000 we want to return always true. as load store pair
>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
>> And we want load store pair to 8/16 32/64. Thats why we want
>> to generate always true for rs6000 to skip pairs as above.
>
> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> you have
> load/store pair instructions where the two arms of the access are 
> storing
> operands of different sizes?  Or something else?
>
> As it stands the logic is to skip the next iteration only if we
> exhausted all the candidate insns for the current access.  In the case
> that we didn't exhaust all such candidates, then the idea is that when
> access becomes prev_access, we can attempt to use those candidates as
> the "left-hand side" of a pair in the next iteration since we failed 
> to
> use them as the "right-hand side" of a pair in the current iteration.
> I don't see why you wouldn't want that behaviour.  Please can you
> explain?
>

 In merge_pair we get the 2 load candiates one load from 0 offset and
 other load is from 16th offset. Then in next iteration we get load
 from 16th offset and other load from 32 offset. In next iteration
 we get load from 32 offset and other load from 48 offset.

 For example:

 Currently we get the load candiates as follows.

 pairs:

 load from 0th offset.
 load from 16th offset.

 next pairs:

 load from 16th offset.
 load from 32th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.

 Instead in rs6000 we should get:

 pairs:

 load from 0th offset
 load from 16th offset.

 next pairs:

 load from 32th offset
 load from 48th offset.
>>>
>>> Hmm, so then I guess my question is: why wouldn't you consider merging
>>> the pair with offsets (16,32) for rs6000?  Is it because you have a
>>> stricter alignment requirement on the base pair offsets (i.e. they have
>>> to be a multiple of 32 when the operand size is 16)?  So the pair
>>> offsets have to be a multiple of the entire pair size rather than a
>>> single operand size> 
>>
>> We get load pair at a certain point with (0,16) and other program
>> point we get load pair (32, 48).
>>
>> In current implementation it takes offsets loads as (0, 16),
>> (16, 32), (32, 48).
>>
>> But In rs6000 we want  the load pair to be merged at different points
>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
>> 0 offset and other load (32, 48) with lxvp with 32 offset.
>>
>> In current case it will merge with lxvp with 0 offset and lxvp with
>> 16 offset, then lxvp with 32 offset and lxvp 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-11 Thread Alex Coplan
On 10/04/2024 23:48, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 10/04/24 7:52 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > On 10/04/2024 15:31, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 10/04/24 1:42 pm, Alex Coplan wrote:
> >>> Hi Ajit,
> >>>
> >>> On 09/04/2024 20:59, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 09/04/24 8:39 pm, Alex Coplan wrote:
> > On 09/04/2024 20:01, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> 
> 
>  On 05/04/24 10:03 pm, Alex Coplan wrote:
> > On 05/04/2024 13:53, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are incorporated.
> > 
> >> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
> >> _list,
> >>  // of accesses.  If we find two sets of adjacent accesses, call
> >>  // merge_pairs.
> >>  void
> >> -ldp_bb_info::transform_for_base (int encoded_lfs,
> >> -   access_group )
> >> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
> >> +   access_group )
> >>  {
> >>const auto lfs = decode_lfs (encoded_lfs);
> >>const unsigned access_size = lfs.size;
> >> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
> >> encoded_lfs,
> >>   access.cand_insns,
> >>   lfs.load_p,
> >>   access_size);
> >> -skip_next = access.cand_insns.empty ();
> >> +skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >
> > As above, why is this needed?
> 
>  For rs6000 we want to return always true. as load store pair
>  that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
>  And we want load store pair to 8/16 32/64. Thats why we want
>  to generate always true for rs6000 to skip pairs as above.
> >>>
> >>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 
> >>> you have
> >>> load/store pair instructions where the two arms of the access are 
> >>> storing
> >>> operands of different sizes?  Or something else?
> >>>
> >>> As it stands the logic is to skip the next iteration only if we
> >>> exhausted all the candidate insns for the current access.  In the case
> >>> that we didn't exhaust all such candidates, then the idea is that when
> >>> access becomes prev_access, we can attempt to use those candidates as
> >>> the "left-hand side" of a pair in the next iteration since we failed 
> >>> to
> >>> use them as the "right-hand side" of a pair in the current iteration.
> >>> I don't see why you wouldn't want that behaviour.  Please can you
> >>> explain?
> >>>
> >>
> >> In merge_pair we get the 2 load candiates one load from 0 offset and
> >> other load is from 16th offset. Then in next iteration we get load
> >> from 16th offset and other load from 32 offset. In next iteration
> >> we get load from 32 offset and other load from 48 offset.
> >>
> >> For example:
> >>
> >> Currently we get the load candiates as follows.
> >>
> >> pairs:
> >>
> >> load from 0th offset.
> >> load from 16th offset.
> >>
> >> next pairs:
> >>
> >> load from 16th offset.
> >> load from 32th offset.
> >>
> >> next pairs:
> >>
> >> load from 32th offset
> >> load from 48th offset.
> >>
> >> Instead in rs6000 we should get:
> >>
> >> pairs:
> >>
> >> load from 0th offset
> >> load from 16th offset.
> >>
> >> next pairs:
> >>
> >> load from 32th offset
> >> load from 48th offset.
> >
> > Hmm, so then I guess my question is: why wouldn't you consider merging
> > the pair with offsets (16,32) for rs6000?  Is it because you have a
> > stricter alignment requirement on the base pair offsets (i.e. they have
> > to be a multiple of 32 when the operand size is 16)?  So the pair
> > offsets have to be a multiple of the entire pair size rather than a
> > single operand size> 
> 
>  We get load pair at a certain point with (0,16) and other program
>  point we get load pair (32, 48).
> 
>  In current implementation it takes offsets loads as (0, 16),
>  (16, 32), (32, 48).
> 
>  But In rs6000 we want  the load pair to be merged at different points
>  as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
>  0 offset and other load (32, 48) with lxvp with 32 offset.
> 
>  In current case it will merge with lxvp with 0 offset and lxvp with
>  16 offset, then lxvp with 32 offset and lxvp with 48 offset which
>  is incorrect in our case as 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Ajit Agarwal
Hello Alex:

On 10/04/24 7:52 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> On 10/04/2024 15:31, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 10/04/24 1:42 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> On 09/04/2024 20:59, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 8:39 pm, Alex Coplan wrote:
> On 09/04/2024 20:01, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 7:29 pm, Alex Coplan wrote:
>>> On 09/04/2024 17:30, Ajit Agarwal wrote:


 On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
> 
>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>> _list,
>>  // of accesses.  If we find two sets of adjacent accesses, call
>>  // merge_pairs.
>>  void
>> -ldp_bb_info::transform_for_base (int encoded_lfs,
>> - access_group )
>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>> + access_group )
>>  {
>>const auto lfs = decode_lfs (encoded_lfs);
>>const unsigned access_size = lfs.size;
>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>> encoded_lfs,
>> access.cand_insns,
>> lfs.load_p,
>> access_size);
>> -  skip_next = access.cand_insns.empty ();
>> +  skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
>
> As above, why is this needed?

 For rs6000 we want to return always true. as load store pair
 that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
 And we want load store pair to 8/16 32/64. Thats why we want
 to generate always true for rs6000 to skip pairs as above.
>>>
>>> Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 you 
>>> have
>>> load/store pair instructions where the two arms of the access are 
>>> storing
>>> operands of different sizes?  Or something else?
>>>
>>> As it stands the logic is to skip the next iteration only if we
>>> exhausted all the candidate insns for the current access.  In the case
>>> that we didn't exhaust all such candidates, then the idea is that when
>>> access becomes prev_access, we can attempt to use those candidates as
>>> the "left-hand side" of a pair in the next iteration since we failed to
>>> use them as the "right-hand side" of a pair in the current iteration.
>>> I don't see why you wouldn't want that behaviour.  Please can you
>>> explain?
>>>
>>
>> In merge_pair we get the 2 load candiates one load from 0 offset and
>> other load is from 16th offset. Then in next iteration we get load
>> from 16th offset and other load from 32 offset. In next iteration
>> we get load from 32 offset and other load from 48 offset.
>>
>> For example:
>>
>> Currently we get the load candiates as follows.
>>
>> pairs:
>>
>> load from 0th offset.
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 16th offset.
>> load from 32th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>>
>> Instead in rs6000 we should get:
>>
>> pairs:
>>
>> load from 0th offset
>> load from 16th offset.
>>
>> next pairs:
>>
>> load from 32th offset
>> load from 48th offset.
>
> Hmm, so then I guess my question is: why wouldn't you consider merging
> the pair with offsets (16,32) for rs6000?  Is it because you have a
> stricter alignment requirement on the base pair offsets (i.e. they have
> to be a multiple of 32 when the operand size is 16)?  So the pair
> offsets have to be a multiple of the entire pair size rather than a
> single operand size> 

 We get load pair at a certain point with (0,16) and other program
 point we get load pair (32, 48).

 In current implementation it takes offsets loads as (0, 16),
 (16, 32), (32, 48).

 But In rs6000 we want  the load pair to be merged at different points
 as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
 0 offset and other load (32, 48) with lxvp with 32 offset.

 In current case it will merge with lxvp with 0 offset and lxvp with
 16 offset, then lxvp with 32 offset and lxvp with 48 offset which
 is incorrect in our case as the (16-32) case 16 offset will not
 load from even register and break for rs6000.
>>>
>>> Sorry, I think I'm still missing something here.  Why does the address 
>>> offset
>>> affect the parity of the tranfser register?  ISTM they needn't be related at
>>> all (and indeed we can't even 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Alex Coplan
Hi Ajit,

On 10/04/2024 15:31, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 10/04/24 1:42 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > On 09/04/2024 20:59, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 8:39 pm, Alex Coplan wrote:
> >>> On 09/04/2024 20:01, Ajit Agarwal wrote:
>  Hello Alex:
> 
>  On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>  Hello Alex/Richard:
> 
>  All review comments are incorporated.

>  @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t 
>  _list,
>   // of accesses.  If we find two sets of adjacent accesses, call
>   // merge_pairs.
>   void
>  -ldp_bb_info::transform_for_base (int encoded_lfs,
>  - access_group )
>  +pair_fusion_bb_info::transform_for_base (int encoded_lfs,
>  + access_group )
>   {
> const auto lfs = decode_lfs (encoded_lfs);
> const unsigned access_size = lfs.size;
>  @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int 
>  encoded_lfs,
>  access.cand_insns,
>  lfs.load_p,
>  access_size);
>  -  skip_next = access.cand_insns.empty ();
>  +  skip_next = bb_state->cand_insns_empty_p (access.cand_insns);
> >>>
> >>> As above, why is this needed?
> >>
> >> For rs6000 we want to return always true. as load store pair
> >> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000.
> >> And we want load store pair to 8/16 32/64. Thats why we want
> >> to generate always true for rs6000 to skip pairs as above.
> >
> > Hmm, sorry, I'm not sure I follow.  Are you saying that for rs6000 you 
> > have
> > load/store pair instructions where the two arms of the access are 
> > storing
> > operands of different sizes?  Or something else?
> >
> > As it stands the logic is to skip the next iteration only if we
> > exhausted all the candidate insns for the current access.  In the case
> > that we didn't exhaust all such candidates, then the idea is that when
> > access becomes prev_access, we can attempt to use those candidates as
> > the "left-hand side" of a pair in the next iteration since we failed to
> > use them as the "right-hand side" of a pair in the current iteration.
> > I don't see why you wouldn't want that behaviour.  Please can you
> > explain?
> >
> 
>  In merge_pair we get the 2 load candiates one load from 0 offset and
>  other load is from 16th offset. Then in next iteration we get load
>  from 16th offset and other load from 32 offset. In next iteration
>  we get load from 32 offset and other load from 48 offset.
> 
>  For example:
> 
>  Currently we get the load candiates as follows.
> 
>  pairs:
> 
>  load from 0th offset.
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 16th offset.
>  load from 32th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> 
>  Instead in rs6000 we should get:
> 
>  pairs:
> 
>  load from 0th offset
>  load from 16th offset.
> 
>  next pairs:
> 
>  load from 32th offset
>  load from 48th offset.
> >>>
> >>> Hmm, so then I guess my question is: why wouldn't you consider merging
> >>> the pair with offsets (16,32) for rs6000?  Is it because you have a
> >>> stricter alignment requirement on the base pair offsets (i.e. they have
> >>> to be a multiple of 32 when the operand size is 16)?  So the pair
> >>> offsets have to be a multiple of the entire pair size rather than a
> >>> single operand size> 
> >>
> >> We get load pair at a certain point with (0,16) and other program
> >> point we get load pair (32, 48).
> >>
> >> In current implementation it takes offsets loads as (0, 16),
> >> (16, 32), (32, 48).
> >>
> >> But In rs6000 we want  the load pair to be merged at different points
> >> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with
> >> 0 offset and other load (32, 48) with lxvp with 32 offset.
> >>
> >> In current case it will merge with lxvp with 0 offset and lxvp with
> >> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which
> >> is incorrect in our case as the (16-32) case 16 offset will not
> >> load from even register and break for rs6000.
> > 
> > Sorry, I think I'm still missing something here.  Why does the address 
> > offset
> > affect the parity of the tranfser register?  ISTM they needn't be related at
> > all (and indeed we can't even know the parity of the transfer register 
> > before
> > RA, 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Ajit Agarwal
Hello Alex:

On 10/04/24 1:42 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> On 09/04/2024 20:59, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 8:39 pm, Alex Coplan wrote:
>>> On 09/04/2024 20:01, Ajit Agarwal wrote:
 Hello Alex:

 On 09/04/24 7:29 pm, Alex Coplan wrote:
> On 09/04/2024 17:30, Ajit Agarwal wrote:
>>
>>
>> On 05/04/24 10:03 pm, Alex Coplan wrote:
>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are incorporated.
>>>
>>> Thanks, I was kind-of expecting you to also send the renaming patch as a
>>> preparatory patch as we discussed.
>>>
>>> Sorry for another meta comment, but: I think the reason that the Linaro
>>> CI isn't running tests on your patches is actually because you're
>>> sending 1/3 of a series but not sending the rest of the series.
>>>
>>> So please can you either send this as an individual preparatory patch
>>> (not marked as a series) or if you're going to send a series (e.g. with
>>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
>>> series when you make updates.  That way the CI should test your patches,
>>> which would be helpful.
>>>
>>
>> Addressed.
>>  

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface betwwen target independent and dependent code.

 Target dependent code is the implementation of pure virtual function 
 for
 aarch64 target and the call to target independent code.

 Thanks & Regards
 Ajit


 aarch64: Place target independent and dependent changed code in one 
 file

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface betwwen target independent and dependent code.

 Target dependent code is the implementation of pure virtual function 
 for
 aarch64 target and the call to target independent code.

 2024-04-06  Ajit Kumar Agarwal  

 gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Place target
independent and dependent changed code.
>>>
>>> You're going to need a proper ChangeLog eventually, but I guess there's
>>> no need for that right now.
>>>
 ---
  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
  1 file changed, 249 insertions(+), 122 deletions(-)

 diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
 b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 index 22ed95eb743..cb21b514ef7 100644
 --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
 +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 @@ -138,8 +138,122 @@ struct alt_base
poly_int64 offset;
  };
  
 +// Virtual base class for load/store walkers used in alias analysis.
 +struct alias_walker
 +{
 +  virtual bool conflict_p (int ) const = 0;
 +  virtual insn_info *insn () const = 0;
 +  virtual bool valid () const  = 0;
>>>
>>> Heh, looking at this made me realise there is a whitespace bug here in
>>> the existing code (double space after const).  Sorry about that!  I'll
>>> push an obvious fix for that.
>>>
 +  virtual void advance () = 0;
 +};
 +
 +struct pair_fusion {
 +
 +  pair_fusion () {};
>>>
>>> This ctor looks pointless at the moment.  Perhaps instead we could put
>>> the contents of ldp_fusion_init in here and then delete that function?
>>>
>>
>> Addressed.
>>
 +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
 + bool load_p) = 0;
>>>
>>> Please can we have comments above each of these virtual functions
>>> describing any parameters, what the purpose of the hook is, and the
>>> interpretation of the return value?  This will serve as the
>>> documentation for other targets that want to make use of the pass.
>>>
>>> It might make sense to have a default-false implementation for
>>> fpsimd_op_p, especially if you don't want to make use of this bit for
>>> rs6000.
>>>
>>
>> Addressed.
>>  
 +
 +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
 +  virtual bool pair_trailing_writeback_p () = 0;
>>>
>>> Sorry for the run-around, but: I think this and
>>> 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Alex Coplan
Hi Ajit,

On 09/04/2024 20:59, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 09/04/24 8:39 pm, Alex Coplan wrote:
> > On 09/04/2024 20:01, Ajit Agarwal wrote:
> >> Hello Alex:
> >>
> >> On 09/04/24 7:29 pm, Alex Coplan wrote:
> >>> On 09/04/2024 17:30, Ajit Agarwal wrote:
> 
> 
>  On 05/04/24 10:03 pm, Alex Coplan wrote:
> > On 05/04/2024 13:53, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are incorporated.
> >
> > Thanks, I was kind-of expecting you to also send the renaming patch as a
> > preparatory patch as we discussed.
> >
> > Sorry for another meta comment, but: I think the reason that the Linaro
> > CI isn't running tests on your patches is actually because you're
> > sending 1/3 of a series but not sending the rest of the series.
> >
> > So please can you either send this as an individual preparatory patch
> > (not marked as a series) or if you're going to send a series (e.g. with
> > a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> > series when you make updates.  That way the CI should test your patches,
> > which would be helpful.
> >
> 
>  Addressed.
>   
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function 
> >> for
> >> aarch64 target and the call to target independent code.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >>
> >> aarch64: Place target independent and dependent changed code in one 
> >> file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function 
> >> for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-04-06  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>independent and dependent changed code.
> >
> > You're going to need a proper ChangeLog eventually, but I guess there's
> > no need for that right now.
> >
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
> >>  1 file changed, 249 insertions(+), 122 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 22ed95eb743..cb21b514ef7 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,8 +138,122 @@ struct alt_base
> >>poly_int64 offset;
> >>  };
> >>  
> >> +// Virtual base class for load/store walkers used in alias analysis.
> >> +struct alias_walker
> >> +{
> >> +  virtual bool conflict_p (int ) const = 0;
> >> +  virtual insn_info *insn () const = 0;
> >> +  virtual bool valid () const  = 0;
> >
> > Heh, looking at this made me realise there is a whitespace bug here in
> > the existing code (double space after const).  Sorry about that!  I'll
> > push an obvious fix for that.
> >
> >> +  virtual void advance () = 0;
> >> +};
> >> +
> >> +struct pair_fusion {
> >> +
> >> +  pair_fusion () {};
> >
> > This ctor looks pointless at the moment.  Perhaps instead we could put
> > the contents of ldp_fusion_init in here and then delete that function?
> >
> 
>  Addressed.
> 
> >> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >> + bool load_p) = 0;
> >
> > Please can we have comments above each of these virtual functions
> > describing any parameters, what the purpose of the hook is, and the
> > interpretation of the return value?  This will serve as the
> > documentation for other targets that want to make use of the pass.
> >
> > It might make sense to have a default-false implementation for
> > fpsimd_op_p, especially if you don't want to make use of this bit for
> > rs6000.
> >
> 
>  Addressed.
>   
> >> +
> >> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> >> +  virtual bool pair_trailing_writeback_p () = 0;
> >
> > Sorry for the run-around, but: I think this and
> > handle_writeback_opportunities () should be the same function, either
> > returning 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
Hello Alex:

On 09/04/24 8:39 pm, Alex Coplan wrote:
> On 09/04/2024 20:01, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 09/04/24 7:29 pm, Alex Coplan wrote:
>>> On 09/04/2024 17:30, Ajit Agarwal wrote:


 On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
>
> Thanks, I was kind-of expecting you to also send the renaming patch as a
> preparatory patch as we discussed.
>
> Sorry for another meta comment, but: I think the reason that the Linaro
> CI isn't running tests on your patches is actually because you're
> sending 1/3 of a series but not sending the rest of the series.
>
> So please can you either send this as an individual preparatory patch
> (not marked as a series) or if you're going to send a series (e.g. with
> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> series when you make updates.  That way the CI should test your patches,
> which would be helpful.
>

 Addressed.
  
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64: Place target independent and dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-04-06  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>  independent and dependent changed code.
>
> You're going to need a proper ChangeLog eventually, but I guess there's
> no need for that right now.
>
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
>>  1 file changed, 249 insertions(+), 122 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 22ed95eb743..cb21b514ef7 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,8 +138,122 @@ struct alt_base
>>poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int ) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const  = 0;
>
> Heh, looking at this made me realise there is a whitespace bug here in
> the existing code (double space after const).  Sorry about that!  I'll
> push an obvious fix for that.
>
>> +  virtual void advance () = 0;
>> +};
>> +
>> +struct pair_fusion {
>> +
>> +  pair_fusion () {};
>
> This ctor looks pointless at the moment.  Perhaps instead we could put
> the contents of ldp_fusion_init in here and then delete that function?
>

 Addressed.

>> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> +   bool load_p) = 0;
>
> Please can we have comments above each of these virtual functions
> describing any parameters, what the purpose of the hook is, and the
> interpretation of the return value?  This will serve as the
> documentation for other targets that want to make use of the pass.
>
> It might make sense to have a default-false implementation for
> fpsimd_op_p, especially if you don't want to make use of this bit for
> rs6000.
>

 Addressed.
  
>> +
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +  virtual bool pair_trailing_writeback_p () = 0;
>
> Sorry for the run-around, but: I think this and
> handle_writeback_opportunities () should be the same function, either
> returning an enum or taking an enum and returning a boolean.
>
> At a minimum they should have more similar sounding names.
>

 Addressed.

>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +  machine_mode mem_mode) = 0;
>> +  virtual int 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Alex Coplan
On 09/04/2024 20:01, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 09/04/24 7:29 pm, Alex Coplan wrote:
> > On 09/04/2024 17:30, Ajit Agarwal wrote:
> >>
> >>
> >> On 05/04/24 10:03 pm, Alex Coplan wrote:
> >>> On 05/04/2024 13:53, Ajit Agarwal wrote:
>  Hello Alex/Richard:
> 
>  All review comments are incorporated.
> >>>
> >>> Thanks, I was kind-of expecting you to also send the renaming patch as a
> >>> preparatory patch as we discussed.
> >>>
> >>> Sorry for another meta comment, but: I think the reason that the Linaro
> >>> CI isn't running tests on your patches is actually because you're
> >>> sending 1/3 of a series but not sending the rest of the series.
> >>>
> >>> So please can you either send this as an individual preparatory patch
> >>> (not marked as a series) or if you're going to send a series (e.g. with
> >>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> >>> series when you make updates.  That way the CI should test your patches,
> >>> which would be helpful.
> >>>
> >>
> >> Addressed.
> >>  
> 
>  Common infrastructure of load store pair fusion is divided into target
>  independent and target dependent changed code.
> 
>  Target independent code is the Generic code with pure virtual function
>  to interface betwwen target independent and dependent code.
> 
>  Target dependent code is the implementation of pure virtual function for
>  aarch64 target and the call to target independent code.
> 
>  Thanks & Regards
>  Ajit
> 
> 
>  aarch64: Place target independent and dependent changed code in one file
> 
>  Common infrastructure of load store pair fusion is divided into target
>  independent and target dependent changed code.
> 
>  Target independent code is the Generic code with pure virtual function
>  to interface betwwen target independent and dependent code.
> 
>  Target dependent code is the implementation of pure virtual function for
>  aarch64 target and the call to target independent code.
> 
>  2024-04-06  Ajit Kumar Agarwal  
> 
>  gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> >>>
> >>> You're going to need a proper ChangeLog eventually, but I guess there's
> >>> no need for that right now.
> >>>
>  ---
>   gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
>   1 file changed, 249 insertions(+), 122 deletions(-)
> 
>  diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>  b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>  index 22ed95eb743..cb21b514ef7 100644
>  --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>  +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>  @@ -138,8 +138,122 @@ struct alt_base
> poly_int64 offset;
>   };
>   
>  +// Virtual base class for load/store walkers used in alias analysis.
>  +struct alias_walker
>  +{
>  +  virtual bool conflict_p (int ) const = 0;
>  +  virtual insn_info *insn () const = 0;
>  +  virtual bool valid () const  = 0;
> >>>
> >>> Heh, looking at this made me realise there is a whitespace bug here in
> >>> the existing code (double space after const).  Sorry about that!  I'll
> >>> push an obvious fix for that.
> >>>
>  +  virtual void advance () = 0;
>  +};
>  +
>  +struct pair_fusion {
>  +
>  +  pair_fusion () {};
> >>>
> >>> This ctor looks pointless at the moment.  Perhaps instead we could put
> >>> the contents of ldp_fusion_init in here and then delete that function?
> >>>
> >>
> >> Addressed.
> >>
>  +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>  +   bool load_p) = 0;
> >>>
> >>> Please can we have comments above each of these virtual functions
> >>> describing any parameters, what the purpose of the hook is, and the
> >>> interpretation of the return value?  This will serve as the
> >>> documentation for other targets that want to make use of the pass.
> >>>
> >>> It might make sense to have a default-false implementation for
> >>> fpsimd_op_p, especially if you don't want to make use of this bit for
> >>> rs6000.
> >>>
> >>
> >> Addressed.
> >>  
>  +
>  +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>  +  virtual bool pair_trailing_writeback_p () = 0;
> >>>
> >>> Sorry for the run-around, but: I think this and
> >>> handle_writeback_opportunities () should be the same function, either
> >>> returning an enum or taking an enum and returning a boolean.
> >>>
> >>> At a minimum they should have more similar sounding names.
> >>>
> >>
> >> Addressed.
> >>
>  +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>  +  machine_mode mem_mode) = 0;
>  +  virtual int pair_mem_alias_check_limit () = 0;
>  +  virtual bool 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
Hello Alex:

On 09/04/24 7:29 pm, Alex Coplan wrote:
> On 09/04/2024 17:30, Ajit Agarwal wrote:
>>
>>
>> On 05/04/24 10:03 pm, Alex Coplan wrote:
>>> On 05/04/2024 13:53, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are incorporated.
>>>
>>> Thanks, I was kind-of expecting you to also send the renaming patch as a
>>> preparatory patch as we discussed.
>>>
>>> Sorry for another meta comment, but: I think the reason that the Linaro
>>> CI isn't running tests on your patches is actually because you're
>>> sending 1/3 of a series but not sending the rest of the series.
>>>
>>> So please can you either send this as an individual preparatory patch
>>> (not marked as a series) or if you're going to send a series (e.g. with
>>> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
>>> series when you make updates.  That way the CI should test your patches,
>>> which would be helpful.
>>>
>>
>> Addressed.
>>  

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface betwwen target independent and dependent code.

 Target dependent code is the implementation of pure virtual function for
 aarch64 target and the call to target independent code.

 Thanks & Regards
 Ajit


 aarch64: Place target independent and dependent changed code in one file

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface betwwen target independent and dependent code.

 Target dependent code is the implementation of pure virtual function for
 aarch64 target and the call to target independent code.

 2024-04-06  Ajit Kumar Agarwal  

 gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Place target
independent and dependent changed code.
>>>
>>> You're going to need a proper ChangeLog eventually, but I guess there's
>>> no need for that right now.
>>>
 ---
  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
  1 file changed, 249 insertions(+), 122 deletions(-)

 diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
 b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 index 22ed95eb743..cb21b514ef7 100644
 --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
 +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 @@ -138,8 +138,122 @@ struct alt_base
poly_int64 offset;
  };
  
 +// Virtual base class for load/store walkers used in alias analysis.
 +struct alias_walker
 +{
 +  virtual bool conflict_p (int ) const = 0;
 +  virtual insn_info *insn () const = 0;
 +  virtual bool valid () const  = 0;
>>>
>>> Heh, looking at this made me realise there is a whitespace bug here in
>>> the existing code (double space after const).  Sorry about that!  I'll
>>> push an obvious fix for that.
>>>
 +  virtual void advance () = 0;
 +};
 +
 +struct pair_fusion {
 +
 +  pair_fusion () {};
>>>
>>> This ctor looks pointless at the moment.  Perhaps instead we could put
>>> the contents of ldp_fusion_init in here and then delete that function?
>>>
>>
>> Addressed.
>>
 +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
 + bool load_p) = 0;
>>>
>>> Please can we have comments above each of these virtual functions
>>> describing any parameters, what the purpose of the hook is, and the
>>> interpretation of the return value?  This will serve as the
>>> documentation for other targets that want to make use of the pass.
>>>
>>> It might make sense to have a default-false implementation for
>>> fpsimd_op_p, especially if you don't want to make use of this bit for
>>> rs6000.
>>>
>>
>> Addressed.
>>  
 +
 +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
 +  virtual bool pair_trailing_writeback_p () = 0;
>>>
>>> Sorry for the run-around, but: I think this and
>>> handle_writeback_opportunities () should be the same function, either
>>> returning an enum or taking an enum and returning a boolean.
>>>
>>> At a minimum they should have more similar sounding names.
>>>
>>
>> Addressed.
>>
 +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
 +machine_mode mem_mode) = 0;
 +  virtual int pair_mem_alias_check_limit () = 0;
 +  virtual bool handle_writeback_opportunities () = 0 ;
 +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
 +  machine_mode mode) = 0;
 +  virtual rtx gen_mem_pair (rtx *pats,  rtx writeback,
>>>
>>> Nit: excess whitespace after pats,
>>>
 +  bool load_p) = 0;

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Alex Coplan
On 09/04/2024 17:30, Ajit Agarwal wrote:
> 
> 
> On 05/04/24 10:03 pm, Alex Coplan wrote:
> > On 05/04/2024 13:53, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are incorporated.
> > 
> > Thanks, I was kind-of expecting you to also send the renaming patch as a
> > preparatory patch as we discussed.
> > 
> > Sorry for another meta comment, but: I think the reason that the Linaro
> > CI isn't running tests on your patches is actually because you're
> > sending 1/3 of a series but not sending the rest of the series.
> > 
> > So please can you either send this as an individual preparatory patch
> > (not marked as a series) or if you're going to send a series (e.g. with
> > a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> > series when you make updates.  That way the CI should test your patches,
> > which would be helpful.
> >
> 
> Addressed.
>  
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >>
> >> aarch64: Place target independent and dependent changed code in one file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-04-06  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>independent and dependent changed code.
> > 
> > You're going to need a proper ChangeLog eventually, but I guess there's
> > no need for that right now.
> > 
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
> >>  1 file changed, 249 insertions(+), 122 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 22ed95eb743..cb21b514ef7 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,8 +138,122 @@ struct alt_base
> >>poly_int64 offset;
> >>  };
> >>  
> >> +// Virtual base class for load/store walkers used in alias analysis.
> >> +struct alias_walker
> >> +{
> >> +  virtual bool conflict_p (int ) const = 0;
> >> +  virtual insn_info *insn () const = 0;
> >> +  virtual bool valid () const  = 0;
> > 
> > Heh, looking at this made me realise there is a whitespace bug here in
> > the existing code (double space after const).  Sorry about that!  I'll
> > push an obvious fix for that.
> > 
> >> +  virtual void advance () = 0;
> >> +};
> >> +
> >> +struct pair_fusion {
> >> +
> >> +  pair_fusion () {};
> > 
> > This ctor looks pointless at the moment.  Perhaps instead we could put
> > the contents of ldp_fusion_init in here and then delete that function?
> > 
> 
> Addressed.
> 
> >> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> >> + bool load_p) = 0;
> > 
> > Please can we have comments above each of these virtual functions
> > describing any parameters, what the purpose of the hook is, and the
> > interpretation of the return value?  This will serve as the
> > documentation for other targets that want to make use of the pass.
> > 
> > It might make sense to have a default-false implementation for
> > fpsimd_op_p, especially if you don't want to make use of this bit for
> > rs6000.
> >
> 
> Addressed.
>  
> >> +
> >> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> >> +  virtual bool pair_trailing_writeback_p () = 0;
> > 
> > Sorry for the run-around, but: I think this and
> > handle_writeback_opportunities () should be the same function, either
> > returning an enum or taking an enum and returning a boolean.
> > 
> > At a minimum they should have more similar sounding names.
> > 
> 
> Addressed.
> 
> >> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> >> +machine_mode mem_mode) = 0;
> >> +  virtual int pair_mem_alias_check_limit () = 0;
> >> +  virtual bool handle_writeback_opportunities () = 0 ;
> >> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> >> +  machine_mode mode) = 0;
> >> +  virtual rtx gen_mem_pair (rtx *pats,  rtx writeback,
> > 
> > Nit: excess whitespace after pats,
> > 
> >> +  bool load_p) = 0;
> >> +  virtual bool pair_mem_promote_writeback_p 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal



On 05/04/24 10:03 pm, Alex Coplan wrote:
> On 05/04/2024 13:53, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are incorporated.
> 
> Thanks, I was kind-of expecting you to also send the renaming patch as a
> preparatory patch as we discussed.
> 
> Sorry for another meta comment, but: I think the reason that the Linaro
> CI isn't running tests on your patches is actually because you're
> sending 1/3 of a series but not sending the rest of the series.
> 
> So please can you either send this as an individual preparatory patch
> (not marked as a series) or if you're going to send a series (e.g. with
> a preparatory rename patch as 1/2 and this as 2/2) then send the entire
> series when you make updates.  That way the CI should test your patches,
> which would be helpful.
>

Addressed.
 
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64: Place target independent and dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-04-06  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>  independent and dependent changed code.
> 
> You're going to need a proper ChangeLog eventually, but I guess there's
> no need for that right now.
> 
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
>>  1 file changed, 249 insertions(+), 122 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 22ed95eb743..cb21b514ef7 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,8 +138,122 @@ struct alt_base
>>poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int ) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const  = 0;
> 
> Heh, looking at this made me realise there is a whitespace bug here in
> the existing code (double space after const).  Sorry about that!  I'll
> push an obvious fix for that.
> 
>> +  virtual void advance () = 0;
>> +};
>> +
>> +struct pair_fusion {
>> +
>> +  pair_fusion () {};
> 
> This ctor looks pointless at the moment.  Perhaps instead we could put
> the contents of ldp_fusion_init in here and then delete that function?
> 

Addressed.

>> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
>> +   bool load_p) = 0;
> 
> Please can we have comments above each of these virtual functions
> describing any parameters, what the purpose of the hook is, and the
> interpretation of the return value?  This will serve as the
> documentation for other targets that want to make use of the pass.
> 
> It might make sense to have a default-false implementation for
> fpsimd_op_p, especially if you don't want to make use of this bit for
> rs6000.
>

Addressed.
 
>> +
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +  virtual bool pair_trailing_writeback_p () = 0;
> 
> Sorry for the run-around, but: I think this and
> handle_writeback_opportunities () should be the same function, either
> returning an enum or taking an enum and returning a boolean.
> 
> At a minimum they should have more similar sounding names.
> 

Addressed.

>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>> +  machine_mode mem_mode) = 0;
>> +  virtual int pair_mem_alias_check_limit () = 0;
>> +  virtual bool handle_writeback_opportunities () = 0 ;
>> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
>> +machine_mode mode) = 0;
>> +  virtual rtx gen_mem_pair (rtx *pats,  rtx writeback,
> 
> Nit: excess whitespace after pats,
> 
>> +bool load_p) = 0;
>> +  virtual bool pair_mem_promote_writeback_p (rtx pat) = 0;
>> +  virtual bool track_load_p () = 0;
>> +  virtual bool track_store_p () = 0;
> 
> I think it would probably make more sense for these two to have
> default-true implementations rather than being pure virtual functions.
> 
> Also, sorry for the bikeshedding, but please can we keep 

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-05 Thread Alex Coplan
On 05/04/2024 13:53, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are incorporated.

Thanks, I was kind-of expecting you to also send the renaming patch as a
preparatory patch as we discussed.

Sorry for another meta comment, but: I think the reason that the Linaro
CI isn't running tests on your patches is actually because you're
sending 1/3 of a series but not sending the rest of the series.

So please can you either send this as an individual preparatory patch
(not marked as a series) or if you're going to send a series (e.g. with
a preparatory rename patch as 1/2 and this as 2/2) then send the entire
series when you make updates.  That way the CI should test your patches,
which would be helpful.

> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Thanks & Regards
> Ajit
> 
> 
> aarch64: Place target independent and dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-04-06  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.

You're going to need a proper ChangeLog eventually, but I guess there's
no need for that right now.

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 371 +++
>  1 file changed, 249 insertions(+), 122 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..cb21b514ef7 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,8 +138,122 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;

Heh, looking at this made me realise there is a whitespace bug here in
the existing code (double space after const).  Sorry about that!  I'll
push an obvious fix for that.

> +  virtual void advance () = 0;
> +};
> +
> +struct pair_fusion {
> +
> +  pair_fusion () {};

This ctor looks pointless at the moment.  Perhaps instead we could put
the contents of ldp_fusion_init in here and then delete that function?

> +  virtual bool fpsimd_op_p (rtx reg_op, machine_mode mem_mode,
> +bool load_p) = 0;

Please can we have comments above each of these virtual functions
describing any parameters, what the purpose of the hook is, and the
interpretation of the return value?  This will serve as the
documentation for other targets that want to make use of the pass.

It might make sense to have a default-false implementation for
fpsimd_op_p, especially if you don't want to make use of this bit for
rs6000.

> +
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +  virtual bool pair_trailing_writeback_p () = 0;

Sorry for the run-around, but: I think this and
handle_writeback_opportunities () should be the same function, either
returning an enum or taking an enum and returning a boolean.

At a minimum they should have more similar sounding names.

> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mem_mode) = 0;
> +  virtual int pair_mem_alias_check_limit () = 0;
> +  virtual bool handle_writeback_opportunities () = 0 ;
> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> + machine_mode mode) = 0;
> +  virtual rtx gen_mem_pair (rtx *pats,  rtx writeback,

Nit: excess whitespace after pats,

> + bool load_p) = 0;
> +  virtual bool pair_mem_promote_writeback_p (rtx pat) = 0;
> +  virtual bool track_load_p () = 0;
> +  virtual bool track_store_p () = 0;

I think it would probably make more sense for these two to have
default-true implementations rather than being pure virtual functions.

Also, sorry for the bikeshedding, but please can we keep the plural
names (so track_loads_p and track_stores_p).

> +  virtual bool cand_insns_empty_p (std::list ) = 0;

Why does this need to be virtualised?  I would it expect it to
just be insns.empty () on all targets.

> +  virtual bool pair_mem_in_range_p