On Mon, Oct 6, 2014 at 11:57 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Oct 1, 2014 at 5:06 AM, Mike Stump <mikest...@comcast.net> wrote:
>> On Sep 30, 2014, at 2:22 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Then I decided to take one step forward to introduce a generic
>>> instruction fusion infrastructure in GCC, because in essence, load/store
>>> pair is nothing different with other instruction fusion, all these 
>>> optimizations
>>> want is to push instructions together in instruction flow.
>>
>> I like the step you took.  I had exactly this in mind when I wrote the 
>> original.
>>
>>> N0 ~= 1300
>>> N1/N2 ~= 5000
>>> N3 ~= 7500
>>
>> Nice.  Would be nice to see metrics for time to ensure that the code isn't 
>> actually worse (CSiBE and/or spec and/or some other).  I didn't have any 
>> large scale benchmark runs with my code and I did worry about extending 
>> lifetimes and register pressure.
>
> Hi Mike,
> I did collect spec2k performance after pairing load/store using this
> patch on both aarch64 and cortex-a15.  The performance is improved
> obviously, especially on cortex-a57.  There are some (though not many)
> benchmarks are regressed a little.  There is no register pressure
> problem here because this pass is put between register allocation and
> sched2, I guess sched2 should resolve most pipeline hazards introduced
> by this pass.

How many merging opportunities does sched2 undo again?  ISTR it
has the tendency of pushing stores down and loads up.

Richard.

>>
>>> I cleared up Mike's patch and fixed some implementation bugs in it
>>
>> So, I'm wondering what the bugs or missed opportunities were?  And, if they 
>> were of the type of problem that generated incorrect code or if they were of 
>> the type that was merely a missed opportunity.
> Just missed opportunity issues.
>
> Thanks,
> bin

Reply via email to