[PATCH] D113107: Support of expression granularity for _Float16.

Phoebe Wang via Phabricator via cfe-commits Wed, 22 Jun 2022 06:29:50 -0700

pengfei added inline comments.


================
Comment at: clang/lib/CodeGen/CGExprComplex.cpp:896
+
+ComplexPairTy ComplexExprEmitter::EmitPromoted(const Expr *E) {
+  if (auto *BinOp = dyn_cast<BinaryOperator>(E->IgnoreParens())) {
----------------
zahiraam wrote:
> pengfei wrote:
> > rjmccall wrote:
> > > pengfei wrote:
> > > > rjmccall wrote:
> > > > > zahiraam wrote:
> > > > > > rjmccall wrote:
> > > > > > > `EmitPromoted` should take the promotion type.
> > > > > > > 
> > > > > > > You are missing the logic in this file which handles *unpromoted* 
> > > > > > > emission (where you have a binary operator but the context wants 
> > > > > > > an unpromoted value) by recognizing that you need to do a 
> > > > > > > promoted operation and then truncate.
> > > > > > Sorry but not sure what you mean here. A case where we don't want 
> > > > > > any promotion would be:
> > > > > > 
> > > > > > float _Complex add(float _Complex a, float _Complex b) {
> > > > > >   return a + b;
> > > > > > }
> > > > > > 
> > > > > > In this case, getPromotionType would return the null type and 
> > > > > > EmitBinOps would just go through the  "older" control path where 
> > > > > > there is no promotion.
> > > > > > Unless I misunderstood your comment?
> > > > > > 
> > > > > > 
> > > > > I'm talking about the unpromoted emission path for an operation that 
> > > > > we want to *internally* promote.  So like your example but using 
> > > > > `_Float16` — we want to emit the binary `+` as a `float` operation, 
> > > > > but the context needs an unpromoted value because it's going to be 
> > > > > returned.
> > > > > 
> > > > > The promoted emission path (the various `EmitPromoted` methods) 
> > > > > represents a request by the caller to produce a result that doesn't 
> > > > > match the formal type of the expression.  The normal emission path 
> > > > > `Visit` etc.) represents a request by the caller to produce a result 
> > > > > normally, i.e. one that matches the formal type.  In general, we 
> > > > > always start in the latter because arbitrary contexts always expect a 
> > > > > value of the formal type; it's only these recursive calls within 
> > > > > promoted emitters that contextually want a promoted value.
> > > > > 
> > > > > In your current patch, you're entering the promoted path by 
> > > > > special-casing one context that frequently terminates promoted 
> > > > > emission: `EmitComplexExprIntoLValue`, which is used for things like 
> > > > > assignment.  That's not the right way to handle it, though.  Instead, 
> > > > > you should do like I asked you to do with the scalar emitter, which 
> > > > > is to recognize in the normal emission path for a promotable 
> > > > > operation (like binary `+`) that you need to promote the operation 
> > > > > and then unpromote the result.  Then things like 
> > > > > `EmitComplexExprIntoLValue` will continue to simply enter the normal 
> > > > > path because that's the kind of value they need, and the promotion 
> > > > > logic will do the right thing from there to ensure we don't emit a 
> > > > > `_Float16` operation in LLVM IR.
> > > > > 
> > > > > Incidentally, I thought of another context that you ought to do 
> > > > > promotion for: when we're converting a promotable value to a larger 
> > > > > floating-point type, we should presumably convert the promoted value 
> > > > > directly rather than truncating it to the unpromoted type before the 
> > > > > conversion.
> > > > I'm not familiar with the FE details, but I understand the background 
> > > > is just to promote expressions that have more than one operation. So I 
> > > > agree with @zahiraam, and
> > > > > to ensure we don't emit a _Float16 operation in LLVM IR
> > > > is not what we expected here.
> > > > 
> > > > `_Float16` is a ABI type for which a target supported has to lower any 
> > > > `_Float16` operations in LLVM IR. See what we have done on X86 backend 
> > > > by D107082.
> > > > 
> > > > That said, a single `a + b` can be and should be emitted as `%c = fadd 
> > > > half %a, %b`.
> > > Well, you can provide operation-by-operation lowering support if you want 
> > > to, but what's going to come out of Clang IRGen in this mode is not going 
> > > to rely on it.  It's far easier to just emit these operations 
> > > differently, always using `float`, than to try to recognize the special 
> > > cases where all the inputs are naturally `half` and so you might as well 
> > > emit a `half` operation.
> > > 
> > > And I think the postcondition that `half` operations don't come out of 
> > > IRGen is a generally useful property even if you've already written this 
> > > lowering code.  Among other things, it makes it easy to test that you 
> > > haven't missed a case where you should be doing promoted emission.
> > We define `_Float16` as ABI type and require targets to support the ABI 
> > lowering. See 
> > https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
> >  Promoting in FE doesn't mean it works on any targets by default, so I 
> > think it's not much useful.
> > We need to support `-fexcess-precision=16` the same as GCC does, it's 
> > easier to do nothing than promote and then truncate each of the operations 
> > in FE.
> > I think in general, lowering operations for a type is a business of 
> > backend. We still need backend to do so for other FEs even we do it in 
> > Clang. A recent example is BF16, which people chose to lower in backend 
> > rather than mlir, see D126444.
> > 
> > > Among other things, it makes it easy to test that you haven't missed a 
> > > case where you should be doing promoted emission.
> > I don't understand, how can we check the missed cases if the promotion was 
> > done in FE?
> This patch is doing by default what GCC is doing with the 
> -fexcess-precision=16. And that would be for expressions with any number of 
> operators. But it looks like you want promotion/truncation only for 
> expression with more than one operator?  I think I agree with @rjmccall that 
> it is easier done this way than trying to recognize the special cases.
> 
> What you want is for a+b to generate fadd half (not extension, and no 
> truncation ) and for a+b*c to generated a set of fpext, fadd float and 
> fptrunc?
> Does GCC use their option for special cases too? I would think that when this 
> option is used all expression (regardless of the number of operators) are 
> extended/truncated right?
> 
> I am not sure I like the idea of lowering some cases in the FE and some 
> others in the BE. I think it would be cleaner to decide that lowering be done 
> in FE for all expressions and have BE react to it. 
> 
> @rmjccall not sure what you mean by:
> "And I think the postcondition that half operations don't come out of IRGen 
> is a generally useful property even if you've already written this lowering 
> code. Among other things, it makes it easy to test that you haven't missed a 
> case where you should be doing promoted emission."
> 
> 
> 
> 
Oh, we may misunderstand each other. Let me clarify my understanding about 
`expression` so that I can make sure we are talking about the same thing.
When I mentioned `expression`, I meant to the C/C++ expressions like:
```
_Float16 t1 = a + b;
_Float16 t2 = a + b + c;
```
So when I talk about `expression granularity`, I mean we do truncation once not 
matter how many actual operations in the expression. This is an example of the 
differences when GCC generates code for them https://godbolt.org/z/YndnP8eM4

> This patch is doing by default what GCC is doing with the 
> -fexcess-precision=16. And that would be for expressions with any number of 
> operators.

I thought this patch is doing the GCC default behavior that without 
`-fexcess-precision=16`. For `-fexcess-precision=16`, we don't need to do any 
promotion at all in FE.

> Does GCC use their option for special cases too? I would think that when this 
> option is used all expression (regardless of the number of operators) are 
> extended/truncated right?

GCC needs the option because it only truncates once per expression by default. 
Is it the same mean as you mentioned `all expression are extended/truncated`?

> I am not sure I like the idea of lowering some cases in the FE and some 
> others in the BE. I think it would be cleaner to decide that lowering be done 
> in FE for all expressions and have BE react to it.

The `expression granularity` I expected can only be done in the FE, because we 
don't have the scope information of C/C++ expression in IR. FE can choose to do 
the promotion/truncation pre operation, or leave it the backend which is what I 
suggested here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113107: Support of expression granularity for _Float16.

Reply via email to