zahiraam added a comment.

Question about this comment:

3. “ Approaches #1 and #2 require a lot of intermediate conversions when 
hardware isn't available. In our example, a + b + c has to be calculated as 
(_Float16) ((float) (_Float16) ((float) a + (float) b) + (float) c), where the 
result of one addition is converted down and then converted back again. You can 
avoid this by specifically recognizing this pattern and eliminating the 
conversion from sub-operations that happen to be of type float, so that in our 
example, a + b + c would be calculated as (_Float16) ((float) a + (float) b + 
(float) c). This is actually allowed by the C standard by default as a form of 
FP contraction; in fact, I believe C's rules for FP contraction were originally 
designed for exactly this kind of situation, except that it was emulating float 
with double on hardware that only provided arithmetic on the latter. Obviously, 
this can change results.”

Without any changes to clang this test case:
// RUN: %clang_cc1 -triple x86_64-linux  -emit-llvm  < %s
_Float16 foo (_Float16 a, _Float16 b, _Float16 c) {

  return (_Float16) ((float) a + (float) b + (float) c);

}
Generates this IR:
target triple = "x86_64-unknown-linux"

; Function Attrs: noinline nounwind optnone
define dso_local half @foo(half %a, half %b, half %c) #0 {
entry:

  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  %c.addr = alloca half, align 2
  store half %a, half* %a.addr, align 2
  store half %b, half* %b.addr, align 2
  store half %c, half* %c.addr, align 2
  %0 = load half, half* %a.addr, align 2
  %conv = fpext half %0 to float
  %1 = load half, half* %b.addr, align 2
  %conv1 = fpext half %1 to float
  %add = fadd float %conv, %conv1
  %2 = load half, half* %c.addr, align 2
  %conv2 = fpext half %2 to float
  %add3 = fadd float %add, %conv2
  %conv4 = fptrunc float %add3 to half
  ret half %conv4

}

And this case:
__fp16 foo (__fp16 a, __fp16 b, __fp16 c) {

  return a + b + c;

}
Compiled with these options:

-c -Xclang "-triple" -Xclang "armv7a-linux-gnu" -target arm -emit-llvm -S
Generates this IR:
target triple = "armv7a-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone
define dso_local arm_aapcscc half @foo(half %a, half %b, half %c) #0 {
entry:

  %a.addr = alloca half, align 2
  %b.addr = alloca half, align 2
  %c.addr = alloca half, align 2
  store half %a, half* %a.addr, align 2
  store half %b, half* %b.addr, align 2
  store half %c, half* %c.addr, align 2
  %0 = load half, half* %a.addr, align 2
  %conv = fpext half %0 to float
  %1 = load half, half* %b.addr, align 2
  %conv1 = fpext half %1 to float
  %add = fadd float %conv, %conv1
  %2 = load half, half* %c.addr, align 2
  %conv2 = fpext half %2 to float
  %add3 = fadd float %add, %conv2
  %3 = fptrunc float %add3 to half
  ret half %3

}

I see no difference in the IR generated.
So this:
// RUN: %clang_cc1 -triple x86_64-linux  -emit-llvm  < %s
_Float16 foo (_Float16 a, _Float16 b, _Float16 c) {

  return a + b + c;

}

Should also generate this same IR right?



================
Comment at: clang/test/CodeGen/X86/Float16-aritmetic.c:8-9
+  // CHECK: alloca half
+  // CHECK: store half {{.*}}, half*
+  // CHECK: store half {{.*}}, half*
+  // CHECK: load half, half*
----------------
pengfei wrote:
> This isn't correct without the ABI code change. I did some work in D107082. I 
> plan to refactor (if I have enough time)
If this the output we want to generate, should the changes D107082 happen 
before the changes in this patch?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to