On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > This patch is supposed to fix regression caused by loop distribution when > ftree-parallelize-loops. The reason is distributed memset call can't be > understood/analyzed in data reference analysis, as a result, parloop can > only parallelize the innermost 2-level loop nest. Before distribution > change, parloop can parallelize the innermost 3-level loop nest, i.e, > more parallelization. > As commented in the PR, ideally, loop distribution should be able to > distribute memset call for 3-level loop nest. Unfortunately this requires > sophisticated work proving equality between tree expressions which gcc > is not good at now. > Another fix is to improve data reference analysis so that memset call > can be supported. We don't know how big this change is and it's definitely > not GCC 8 task. > > So this patch fixes the regression in a bit hacking way. It first enables > 3-level loop nest distribution when flag_tree_parloops > 1. Secondly, it > supports 3-level loop nest distribution for ZERO-ing stmt which can only > be distributed as a loop (nest) of memset, but can't be distributed as a > single memset. The overall effect is ZERO-ing stmt will be distributed > to one loop deeper than now, so parloop can parallelize as before. > > Bootstrap and test on x86_64 and AArch64 ongoing. Is it OK if no errors? Test finished without error. Also I checked -ftree-parallelize-loops=6 on AArch64 and can confirm the regression is resolved.
Thanks, bin > > Thanks, > bin > 2018-01-19 Bin Cheng <bin.ch...@arm.com> > > PR tree-optimization/82604 > * tree-loop-distribution.c (enum partition_kind): New enum item > PKIND_PARTIAL_MEMSET. > (partition_builtin_p): Support above new enum item. > (generate_code_for_partition): Ditto. > (compute_access_range): Differentiate cases that equality can be > proven at all loops, the innermost loops or no loops. > (classify_builtin_st, classify_builtin_ldst): Adjust call to above > function. Set PKIND_PARTIAL_MEMSET for partition appropriately. > (finalize_partitions, distribute_loop): Don't fuse partition of > PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest. > (prepare_perfect_loop_nest): Distribute 3-level loop nest only if > parloop is enabled.