On Mon, Oct 6, 2014 at 11:57 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Oct 1, 2014 at 5:06 AM, Mike Stump <mikest...@comcast.net> wrote: >> On Sep 30, 2014, at 2:22 AM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Then I decided to take one step forward to introduce a generic >>> instruction fusion infrastructure in GCC, because in essence, load/store >>> pair is nothing different with other instruction fusion, all these >>> optimizations >>> want is to push instructions together in instruction flow. >> >> I like the step you took. I had exactly this in mind when I wrote the >> original. >> >>> N0 ~= 1300 >>> N1/N2 ~= 5000 >>> N3 ~= 7500 >> >> Nice. Would be nice to see metrics for time to ensure that the code isn't >> actually worse (CSiBE and/or spec and/or some other). I didn't have any >> large scale benchmark runs with my code and I did worry about extending >> lifetimes and register pressure. > > Hi Mike, > I did collect spec2k performance after pairing load/store using this > patch on both aarch64 and cortex-a15. The performance is improved > obviously, especially on cortex-a57. There are some (though not many) > benchmarks are regressed a little. There is no register pressure > problem here because this pass is put between register allocation and > sched2, I guess sched2 should resolve most pipeline hazards introduced > by this pass.
How many merging opportunities does sched2 undo again? ISTR it has the tendency of pushing stores down and loads up. Richard. >> >>> I cleared up Mike's patch and fixed some implementation bugs in it >> >> So, I'm wondering what the bugs or missed opportunities were? And, if they >> were of the type of problem that generated incorrect code or if they were of >> the type that was merely a missed opportunity. > Just missed opportunity issues. > > Thanks, > bin