fhahn added a comment.

In D117829#3260772 <https://reviews.llvm.org/D117829#3260772>, @RKSimon wrote:

> I'm happy to continue with this just for integers or wait until we have a 
> plan for floats as well. I guess we need to decide if we want to support the 
> starting value in the fadd/fmul intrinsics from the builtin or not? If we 
> don't then adding float support to the add/mul reduction builtins now (or 
> later on) would just involve using default starting values, if we do then we 
> probably need a separate fadd/fmul builtin.

I'm not sure if there's a major benefit of specifying the start value? Unless 
there is, I think we should not add a start value argument.

> Or we could add starting values to the add/mul reduction builtins as well and 
> we manually insert a scalar post-reduction add/mul instruction in cgbuiltin?
>
> wrt float orders, currently the avx512f reductions attach fmf attributes when 
> the builtin is translated to the intrinsic: 
> https://github.com/llvm/llvm-project/blob/d2012d965d60c3258b3a69d024491698f8aec386/clang/lib/CodeGen/CGBuiltin.cpp#L14070

I am not familiar how exactly `ia32_reduce_fadd_pd512` & co are specified, but 
it looks like adding `reassoicate` to the intrinsic call there might be 
incorrect technically, i.e. calming the order does not matter, while I assume 
the C builtin guarantees a particular order? It might not result in an actual 
mis-compile on X86, because the X86 backend kind of guarantees that reductions 
with `reassoicate` are lowered exactly as the C builtin requires (relying on 
some kind of secret handshake between frontend & backend).

As discussed in  D117480 <https://reviews.llvm.org/D117480> this seems like a 
major weakness in how `llvm.vector.reduce.fadd` is specified. As long as it is 
only used for target specific builtins, things might work out fine in most 
cases, but different targets might lower `reassoicate` reductions differently. 
Also, things might fall apart if the middle end uses the `reassoicate` property 
during a transform that changes the reduction order.

> We might be able to get away with just expecting users to handle this with 
> pragmas in code?

If the user allows reassoication via a pragma/flag, we can use the reduction 
builtin at the moment without problems. The motivation behind specifying a well 
defined order that can be lowered efficiently by targets is to guarantee 
portable results by default. This is important for uses cases where the 
builtins are used in code that's executed on different platforms.

> I was planning to see if that would work for the avx512f fmin/fmax reduction 
> intrinsics so I can use `__builtin_reduce_min/max`.

There shouldn't be any problem with fmin/fmax, as the result should be 
independent of the evaluation order IIUC.

> I'm mainly interested in `__builtin_reduce_mul` as avx512f requires it - the 
> (few) cases I've see it used have always involved char/short pixel data, 
> extended to int/long before the mul reduction to address the overflow issues.

Sounds reasonable!  As mentioned earlier, it would be good to do that as 
separate patch, also updating LanguageExtensions.rst.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117829/new/

https://reviews.llvm.org/D117829

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to