On Sep 30, 2014, at 2:22 AM, Bin Cheng <bin.ch...@arm.com> wrote: > Then I decided to take one step forward to introduce a generic > instruction fusion infrastructure in GCC, because in essence, load/store > pair is nothing different with other instruction fusion, all these > optimizations > want is to push instructions together in instruction flow.
I like the step you took. I had exactly this in mind when I wrote the original. > N0 ~= 1300 > N1/N2 ~= 5000 > N3 ~= 7500 Nice. Would be nice to see metrics for time to ensure that the code isn’t actually worse (CSiBE and/or spec and/or some other). I didn’t have any large scale benchmark runs with my code and I did worry about extending lifetimes and register pressure. > I cleared up Mike's patch and fixed some implementation bugs in it So, I’m wondering what the bugs or missed opportunities were? And, if they were of the type of problem that generated incorrect code or if they were of the type that was merely a missed opportunity.