On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > I am familiar with SVE extension and understand that implemented > approach might be not suitable for ARM. The main point is that only > load/store instructions are masked but all other calculations are not > (we did special conversion for reduction statements to implement > merging predication semantic). For SVE peeling for niters is not > required but it is not true for x86 - we must determine what > vectorization scheme is more profitable: loop combining (the only > essential for SVE) or separate epilogue vectorization using masking or > less vectorization factor. So I'd like to have the full list of > required changes to our implementation to try to remove them. Hmm, sorry that my comment gave impression that I was trying to hold back the patch, it's not what I meant by any means. Also it's not related to SVE, As a matter of fact, I haven't read any document about SVE yet. Sorry again for the false impression conveyed by previous messages.
Thanks, bin > > Thanks. > Yuri. > > 2016-11-09 14:46 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>: >> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Thanks Richard for your comments. >>> Your proposed to handle epilogue loop just like normal short-trip loop >>> but this is not exactly truth since e.g. epilogue must not be peeled >>> for alignment. >> Yes there must be some differences, my motivation is to minimize that >> so we don't need to specially check normal/epilogue loops at too many >> places. >> Of course it's just my feeling when going through the patch set, and >> could be wrong. >> >> Thanks, >> bin >>> >>> It is not clear for me what are my next steps? Should I re-design the >>> patch completely or simply decompose the whole patch to different >>> parts? But it means that we must start review process from beginning >>> but release is closed to its end. >>> Note also that i work for Intel till the end of year and have not idea >>> who will continue working on this project. >>> >>> Any help will be appreciated. >>> >>> Thanks. >>> Yuri. >>> >>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>: >>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Hi All, >>>>> >>>>> I re-send all patches sent by Ilya earlier for review which support >>>>> vectorization of loop epilogues and loops with low trip count. We >>>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not >>>>> approved by Jeff. >>>>> >>>>> I did re-base of all patches and performed bootstrapping and >>>>> regression testing that did not show any new failures. Also all >>>>> changes related to new vect_do_peeling algorithm have been changed >>>>> accordingly. >>>>> >>>>> Is it OK for trunk? >>>> >>>> Hi, >>>> I can't approve patches, but had some comments after going through the >>>> implementation. >>>> >>>> One confusing part is cost model change, as well as the way it's used >>>> to decide how epilogue loop should be vectorized. Given vect-tail is >>>> disabled at the moment and the cost change needs further tuning, is it >>>> reasonable to split this part out and get vectorization part >>>> reviewed/submitted independently? For example, let user specified >>>> parameters make the decision for now. Cost and target dependent >>>> changes should go in at last, this could make the patch easier to >>>> read. >>>> >>>> The implementation computes/shares quite amount information from main >>>> loop to epilogue loop vectorization. Furthermore, variables/fields >>>> for such information are somehow named in a misleading way. For >>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >>>> flag controlling whether epilogue loop should be vectorized with >>>> masking. However, it's actually controlled by exactly the same flag >>>> as whether epilogue loop should be combined into the main loop with >>>> masking: >>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >>>> >>>> slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>>> >>>> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >>>> + vect_combine_loop_epilogue (loop_vinfo); >>>> + >>>> /* Reduce loop iterations by the vectorization factor. */ >>>> scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>>> expected_iterations / vf); >>>> >>>> IMHO, we should decouple main loop vectorization and epilogue >>>> vectorization as much as possible by sharing as few information as we >>>> can. The general idea is to handle epilogue loop just like normal >>>> short-trip loop. For example, we can rename >>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >>>> else), and we don't differentiate its meaning between main and >>>> epilogue(short-trip) loop. It only indicates the current loop should >>>> be vectorized with masking no matter it's a main loop or epilogue >>>> loop, and it works just like the current implementation. >>>> >>>> After this change, we can refine vectorization and make it more >>>> general for normal loop and epilogue(short trip) loop. For example, >>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >>>> loop and use it to control how it should be vectorized: >>>> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >>>> + { >>>> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >>>> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >>>> + } >>>> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >>>> + && min_profitable_combine_iters >= 0) >>>> + { >>>> >>>> This works, but not that good for understanding or maintaining. >>>> >>>> Thanks, >>>> bin