On Thu, Mar 2, 2017 at 4:39 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote: > Hi, > > the following patch defines the PARAM_MIN_VECT_LOOP_BOUND parameter in > the s390 backend. It helps with the vectorization epilogue problem > described here [1]. > I see an overall performance increase of > 1% in SPECfp2006, yet some > cases like cactusADM regress. This seems to be caused by the vectorizer > creating an epilogue guard for one more iteration than before, which, in > turn, causes e.g. predcom to run on the epilogue that it used to ignore > before ("Loop iterates only 1 time, nothing to do."). Subsequent, > minor, effects cause an eventual slowdown. > > Until the reason for the bad epilogue code is understood, this patch > mitigates the problem. When investigating the issue, I stumbled across > an attempt to vectorize the epilogue itself as well as combine it with > the vectorized loop in addition to vector masking [2]. A similar > approach might also help here. My original observation of high register > pressure within the epilogue still stands. In this specific case, it > would most likely suffice to save all registers once, run the epilogue > and restore the registers. I'm pretty sure this would be faster than > the "spill fest" that's currently happening.
Note it also honors PARAM_MIN_VECT_LOOP_BOUND if there's no epilouge. Richard. > Regards > Robin > > [1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html > [2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html > > -- > > gcc/ChangeLog: > > 2017-03-02 Robin Dapp <rd...@linux.vnet.ibm.com> > > * config/s390/s390.c (s390_option_override_internal): Set > PARAM_MIN_VECT_LOOP_BOUND