LewisCrawford wrote: > They should be writing the vector maximumnum/minimumnum intrinsics
Even though `nvvm.fmax/fmin` are very close to `maximumnum/minimumnum`, their semantics slightly differ, since the LLVM intrinsics depend on the global FTZ settings, but the nvvm versions encode on a per-instruction level whether FTZ is desired. As you can see in NVTPXTargetTransformInfo, we do translate to the `llvm.maximumnum` whenever the FTZ semantics make this valid. In addition to these cases like `fmin/fmax` with instruction-level FTZ semantics, other cases that it would be useful to have scalarizable vector forms of target-specific intrinsics would be: - Instructions requiring instruction-level rounding modes (e.g. `nvvm_add_rn_f` etc.) - Target-specific math function approximations (e.g. `nvvm_sin_approx_f` , `amdgcn_cos` etc.) - Other target-specific instructions like `nvvm_fmin_ftz_xorsign_abs_f` , where the `xorsign_abs` pattern would require a chain of several generic LLVM instructions instead of using a single target-specific intrinsic. > Users should be using canonical, generic patterns which the backend can match > into target instructions. Being able to write e.g. `nvvm_add_rz(<32 x float> %x, <32 x float> %y)` seems closer to this ideal of a generic pattern that can be matched into target instructions, rather than requiring users to emit 64x `ExtractElement`s, 32 `nvvm_add_rz`s, and 32 `InsertElement`s, even if it uses a target-specific vector intrinsic, rather than a core LLVM one. https://github.com/llvm/llvm-project/pull/194783 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
