nikic wrote: @phoebewang To extend Ralf's reply slightly, I'd just like to say that implementing support for a new minnum_ieee intrinsics is massively more effort than just supporting minnum with and without nsz. To support nsz on minnum, all we need to do is:
* Handle absence of nsz correctly in libcall lowering, either by forcing expansion instead (trivial) or by implementing a fixup sequence (a bit of effort). This is in generic code, nothing for the X86 backend to do. * In the X86 backend, **delete the code for minnum lowering** and always treat minnum like minimumnum, which already implements all the required signed zero and nsz handling. The implementation complexity for this change for the X86 backend is literally negative. Additionally, we can probably consolidate the existing MINNUM and MINNUM_IEEE ISD opcodes and further simplify things, but I have not thought too carefully about this (and this choice is orthogonal to what we do in IR -- we can keep the opcodes even if we don't have minnum_ieee in IR, just like we already have the opcodes now despite having no such intrinsic). Now, the costs to support minnum_ieee are: * Supporting one additional pair of intrinsics in all middle-end transformations. This is going to take a large amount of work (I think we *still* don't have full optimization parity for minimumnum/minimum, and they have existed for a while). Support for each individual transform is going to be easy as it's essentially going to be "treat it the same as minnum", but writing all the test coverage is certainly a big time sink. * If we follow your proposal where minnum_ieee is only supported if there is a corresponding hardware instruction, then this will also take additional implementation complexity in each frontend, plus probably an additional mechanism for LLVM to expose information about this to the frontend, because it's really unreasonable to expect that frontends know on exactly which targets, subtargets and types an operation is supported. I also have a hard time understanding what the point of an minnum_ieee op without full legalization support even would be. That means that frontends can't actually use the "minnum with ordered zero" semantics in a cross-target way. Is the suggestion for the frontend to emit minnum_ieee on targets that support it and a manual expansion if it isn't? That just doesn't make sense. It's more complexity for worse results, because the backend can emit a more efficient target specific instruction sequence. Realistically, we'd never support a minnum_ieee intrinsic that does not have full legalization support. This just goes completely against LLVM's design for target-independent intrinsics. And if you add full legalization support, then minnum_ieee just becomes a different way to spell "minnum without nsz". Just that the new way of spelling it comes with a) lots of additional implementation complexity and b) a less symmetric/understandable IR design (see the table at the end of Ralf's last comment). > Fast math flags including nsz is different. Some transformations may > intentionally or unintentionally discard fast math flags. @nikic and @arsenm > know the problem better than me. nsz is indeed different, but only in that the risk of dropping nsz is actually lower than for the usual poison generating flags. There are various transforms that have to discard poison flags, e.g. when freeze is pushed upward, or to avoid derefinement in certain transforms. But these issues don't affect nsz, as it's not poison-generating. It can be retained pretty much always. Of course, implementation bugs can exist, but I don't expect this to be a significant practical issue. https://github.com/llvm/llvm-project/pull/172012 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
