Please ignore this one, I will further refine it. Sorry for disturbing! Thanks, bin
On Tue, Dec 16, 2014 at 4:42 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the >>>>> ivopt >>>>> issue still exists. >>>>> >>>>> Current candidate selecting algorithm tends to select fewer candidates >>>>> given >>>>> below reasons: >>>>> 1) to better handle loops with many induction uses but the best choice >>>>> is >>>>> one generic basic induction variable; >>>>> 2) to keep compilation time low. >>>>> >>>>> One fundamental weakness of the strategy is the opposite situation can't >>>>> be >>>>> handled properly sometimes. For these cases the best choice is each >>>>> induction variable has its own candidate. >>>>> This patch fixes the problem by shuffling candidate set after fix-point is >>>>> reached by current implementation. The reason why this strategy works is >>>>> it >>>>> replaces candidate set by selecting local optimal candidate for some >>>>> induction uses, and the new candidate set (has lower cost) is exact what >>>>> we >>>>> want in the mentioned case. Instrumentation data shows this can find >>>>> better >>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >>>>> >>>>> This patch actually is extension to the first version patch posted at >>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >>>>> another selecting pass with special seed set (more or less like the >>>>> shuffled >>>>> set in this patch). Data also confirms this patch can find optimal sets >>>>> for >>>>> most loops found by the first one, as well as optimal sets for many new >>>>> loops. >>>>> >>>>> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >>>>> test on aarch64. >>>>> Since this patch only selects candidate set with lower cost, any >>>>> regressions >>>>> revealed are latent bugs of other components in GCC. >>>>> I also collected GCC bootstrap time on x86_64, no regression either. >>>>> Is this OK? >>>> >>>> The algorithm seems to be quadratic in the number of IV candidates >>>> (at least): >>> Yes, I worried about that too, that's why I measured the bootstrap >>> time. One way is restrict this procedure one time for each loop. I >>> already tried that and it can capture +90% loops. Is this sounds >>> reasonable? >> >> Yes. That's my suggestion to handle it in the caller of try_improve_iv_set? >> >>> BTW, do we have some compilation time benchmarks for GCC? >> >> There are various testcases linked from PR47344, I don't remember >> any particular one putting load on IVOPTs (but I do remember seeing >> IVOPTs in the ~25% area in -ftime-report for some testcases). > > > Hi Jeff & Richard, > I updated patch according to your review comments. Is this version looks > good? > I didn't find cases in PR47344 which exercising IVOPT, but I produced > one case from PR53852 which runs ivopt for ~17% of total time (28s). > This patch does increase IVOPT time to 18%. Unfortunately, I tried > the other restriction, it doesn't work as well as this one on spec2k6, > if I understood the method correctly. > > Hi Sebastian, > Thanks for help! I managed to run llvm compilation time tests > successfully as you suggested. Case > Multisource/Benchmarks/mafft/pairlocalalign is regressed but I can't > reproduce it in cmd. The running time of compilation of > pairlocalalign.c is too small comparing to the results. I also tried > to invoke it by using RunSafely.sh but no lucky either. So any > documentation on this? Thanks very much! > > Thanks, > bin > > 2014-12-16 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/62178 > * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function. > (iv_ca_replace): New function. > (try_improve_iv_set): New parameter try_replace_p. > Replace candidates in IVS by calling iv_ca_replace. > (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set. > > gcc/testsuite/ChangeLog > 2014-12-16 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/62178 > * gcc.target/aarch64/pr62178.c: New test.