Hi Bill,

I’d be a bit worried about asking the backend for the cost of a
COND_REDUCTION, as that will rely on the backend understanding the
implementation the vectorizer is using - every time the vectorizer
changed, the backends would need to be updated too. I’m hoping soon to get
together a patch to reduce the stmts produced on the simpler cases, which
would require a different set of costings. I can also imagine further
improvements being added for other special cases over time. Having the
backends understand every variation would be a little cumbersome.

As it stands today, we correctly exit the optimisation if max reduction
isn’t supported in hardware, which is what the cost model is expecting.


If power wanted to use this implementation, then I think it’d probably
need some code in tree-vect-generic.c to implement on emulated max
reduction, which would then require updates to the costs modelling of
anything that uses max reduction (not just cond reduction). All of that is
outside the scope of this patch.


Thanks,
Alan.

On 10/09/2015 23:14, "Bill Schmidt" <wschm...@linux.vnet.ibm.com> wrote:

>Hi Alan,
>
>The cost modeling of the epilogue code seems pretty target-specific ("An
>EQ stmt and an AND stmt, reduction of the max index and a reduction of
>the found values, a broadcast of the max value," resulting in two
>vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
>this will not represent the cost accurately, and the cost will indeed be
>quite different depending on the mode (logarithmic in the number of
>elements).  I think that you need to create a new entry in
>vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
>each target to calculate the cost appropriately.
>
>(Powerpc doesn't have a max-reduction hardware instruction, but because
>the reduction will be only in the epilogue code, it may still be
>profitable for us to generate the somewhat expensive reduction sequence
>in order to vectorize the loop.  But we definitely want to model it as
>costly in and of itself.  Also, the sequence will produce the maximum
>value in all positions without a separate broadcast.)
>
>Thanks,
>Bill
>
>On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
>> Hi,
>> This patch (attached) adds support for vectorizing conditional
>>expressions
>> (PR 65947), for example:
>> 
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>>     if (a[i] < min_v)
>>       last = a[i];
>>   return last;
>> }
>> 
>> To do this the loop is vectorised to create a vector of data results (ie
>> of matching a[i] values). Using an induction variable, an additional
>> vector is added containing the indexes where the matches occured. In the
>> function epilogue this is reduced to a single max value and then used to
>> index into the vector of data results.
>> When no values are matched in the loop, the indexes vector will contain
>> all zeroes, eventually matching the first entry in the data results
>>vector.
>> 
>> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
>>is
>> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
>> REDUC_MAX_EXPR is not supported for the required modes, failing the
>> vectorization. On mips it complains that the required vcond expression
>>is
>> not supported. It is suggested the relevant backend experts add the
>> required backend support.
>> 
>> Using a simple testcase based around a large number of N and run on an
>> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
>> it's original time.
>> 
>> This patch caused binary differences in three spec2006 binaries on
>>aarch64
>> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
>> showed no improvement or degregation in runtime.
>> 
>> 
>> In the near future I hope to submit a further patch (as PR 66558) which
>> optimises the case where the result is simply the index of the loop, for
>> example:
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>>     if (a[i] < min_v)
>>       last = i;
>>   return last;
>> }
>> In this case a lot of the new code can be optimized away.
>> 
>> I have run check for aarch64, arm and x86 and have seen no regressions.
>> 
>> 
>> Changelog:
>> 
>>     2015-08-28  Alan Hayward <alan.hayw...@arm.com>
>> 
>>         PR tree-optimization/65947
>>         * tree-vect-loop.c
>>         (vect_is_simple_reduction_1): Find condition reductions.
>>         (vect_model_reduction_cost): Add condition reduction costs.
>>         (get_initial_def_for_reduction): Add condition reduction initial
>> var.
>>         (vect_create_epilog_for_reduction): Add condition reduction
>>epilog.
>>         (vectorizable_reduction): Condition reduction support.
>>         * tree-vect-stmts.c
>>         (vectorizable_condition): Add vect reduction arg
>>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>>         vect_max_reduc
>> 
>>     testsuite/Changelog:
>> 
>>         PR tree-optimization/65947
>>         * lib/target-supports.exp
>>         (check_effective_target_vect_max_reduc): Add.
>>         * gcc.dg/vect/pr65947-1.c: New test.
>>         * gcc.dg/vect/pr65947-2.c: New test.
>>         * gcc.dg/vect/pr65947-3.c: New test.
>>         * gcc.dg/vect/pr65947-4.c: New test.
>>         * gcc.dg/vect/pr65947-5.c: New test.
>>         * gcc.dg/vect/pr65947-6.c: New test.
>>         * gcc.dg/vect/pr65947-7.c: New test.
>>         * gcc.dg/vect/pr65947-8.c: New test.
>>         * gcc.dg/vect/pr65947-9.c: New test.
>>         * gcc.dg/vect/pr65947-10.c: New test.
>>         * gcc.dg/vect/pr65947-11.c: New test.
>> 
>> 
>> 
>> Thanks,
>> Alan
>> 
>> 
>
>


Reply via email to