On Fri, Jan 19, 2018 at 5:42 PM, Bin Cheng <bin.ch...@arm.com> wrote:
> Hi,
> This patch is supposed to fix regression caused by loop distribution when
> ftree-parallelize-loops.  The reason is distributed memset call can't be
> understood/analyzed in data reference analysis, as a result, parloop can
> only parallelize the innermost 2-level loop nest.  Before distribution
> change, parloop can parallelize the innermost 3-level loop nest, i.e,
> more parallelization.
> As commented in the PR, ideally, loop distribution should be able to
> distribute memset call for 3-level loop nest.  Unfortunately this requires
> sophisticated work proving equality between tree expressions which gcc
> is not good at now.
> Another fix is to improve data reference analysis so that memset call
> can be supported.  We don't know how big this change is and it's definitely
> not GCC 8 task.
>
> So this patch fixes the regression in a bit hacking way.  It first enables
> 3-level loop nest distribution when flag_tree_parloops > 1.  Secondly, it
> supports 3-level loop nest distribution for ZERO-ing stmt which can only
> be distributed as a loop (nest) of memset, but can't be distributed as a
> single memset.  The overall effect is ZERO-ing stmt will be distributed
> to one loop deeper than now, so parloop can parallelize as before.
>
> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no errors?
Test finished without error.  Also I checked
-ftree-parallelize-loops=6 on AArch64 and can confirm the regression
is resolved.

Thanks,
bin
>
> Thanks,
> bin
> 2018-01-19  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/82604
>         * tree-loop-distribution.c (enum partition_kind): New enum item
>         PKIND_PARTIAL_MEMSET.
>         (partition_builtin_p): Support above new enum item.
>         (generate_code_for_partition): Ditto.
>         (compute_access_range): Differentiate cases that equality can be
>         proven at all loops, the innermost loops or no loops.
>         (classify_builtin_st, classify_builtin_ldst): Adjust call to above
>         function.  Set PKIND_PARTIAL_MEMSET for partition appropriately.
>         (finalize_partitions, distribute_loop): Don't fuse partition of
>         PKIND_PARTIAL_MEMSET kind when distributing 3-level loop nest.
>         (prepare_perfect_loop_nest): Distribute 3-level loop nest only if
>         parloop is enabled.

Reply via email to