nikic wrote:

@phoebewang To extend Ralf's reply slightly, I'd just like to say that 
implementing support for a new minnum_ieee intrinsics is massively more effort 
than just supporting minnum with and without nsz. To support nsz on minnum, all 
we need to do is:

 * Handle absence of nsz correctly in libcall lowering, either by forcing 
expansion instead (trivial) or by implementing a fixup sequence (a bit of 
effort). This is in generic code, nothing for the X86 backend to do.
 * In the X86 backend, **delete the code for minnum lowering** and always treat 
minnum like minimumnum, which already implements all the required signed zero 
and nsz handling. The implementation complexity for this change for the X86 
backend is literally negative.

Additionally, we can probably consolidate the existing MINNUM and MINNUM_IEEE 
ISD opcodes and further simplify things, but I have not thought too carefully 
about this (and this choice is orthogonal to what we do in IR -- we can keep 
the opcodes even if we don't have minnum_ieee in IR, just like we already have 
the opcodes now despite having no such intrinsic).

Now, the costs to support minnum_ieee are:

 * Supporting one additional pair of intrinsics in all middle-end 
transformations. This is going to take a large amount of work (I think we 
*still* don't have full optimization parity for minimumnum/minimum, and they 
have existed for a while). Support for each individual transform is going to be 
easy as it's essentially going to be "treat it the same as minnum", but writing 
all the test coverage is certainly a big time sink.
 * If we follow your proposal where minnum_ieee is only supported if there is a 
corresponding hardware instruction, then this will also take additional 
implementation complexity in each frontend, plus probably an additional 
mechanism for LLVM to expose information about this to the frontend, because 
it's really unreasonable to expect that frontends know on exactly which 
targets, subtargets and types an operation is supported.

I also have a hard time understanding what the point of an minnum_ieee op 
without full legalization support even would be. That means that frontends 
can't actually use the "minnum with ordered zero" semantics in a cross-target 
way. Is the suggestion for the frontend to emit minnum_ieee on targets that 
support it and a manual expansion if it isn't? That just doesn't make sense. 
It's more complexity for worse results, because the backend can emit a more 
efficient target specific instruction sequence.

Realistically, we'd never support a minnum_ieee intrinsic that does not have 
full legalization support. This just goes completely against LLVM's design for 
target-independent intrinsics. And if you add full legalization support, then 
minnum_ieee just becomes a different way to spell "minnum without nsz". Just 
that the new way of spelling it comes with a) lots of additional implementation 
complexity and b) a less symmetric/understandable IR design (see the table at 
the end of Ralf's last comment).

> Fast math flags including nsz is different. Some transformations may 
> intentionally or unintentionally discard fast math flags. @nikic and @arsenm 
> know the problem better than me.

nsz is indeed different, but only in that the risk of dropping nsz is actually 
lower than for the usual poison generating flags. There are various transforms 
that have to discard poison flags, e.g. when freeze is pushed upward, or to 
avoid derefinement in certain transforms. But these issues don't affect nsz, as 
it's not poison-generating. It can be retained pretty much always. Of course, 
implementation bugs can exist, but I don't expect this to be a significant 
practical issue.

https://github.com/llvm/llvm-project/pull/172012
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to