nemanjai added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:15130
+      Value *Y = EmitScalarExpr(E->getArg(1));
+      auto Ret = Builder.CreateFDiv(X, Y, "recipdiv");
+      Builder.setFastMathFlags(FMF);
----------------
bmahjour wrote:
> I wonder if we can do better than "fdiv fast"... does the current lowering of 
> "fdiv fast" employ an estimation algorithm via iterative refinement on POWER?
Yes. This `fast` includes `arcp` which will trigger the estimation+refinement 
algorithm in the back end.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:15134
+    }
+    llvm::Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType);
+    auto Ret = Builder.CreateCall(F, X);
----------------
bmahjour wrote:
> This doesn't implement a reciprocal square root, it just performs a square 
> root! At the very least we need a divide instruction following the call to 
> the intrinsic, but I'm not sure if that'll result in the most optimal codegen 
> at the end. Perhaps we need a new builtin?
Oh, I misread the documentation. This really seems like a bizarre thing to 
offer a user. I will change this to `1/sqrt()`.
In terms of providing optimal performance, with fast-math, the optimizer should 
get rid of the divide. If compiled at `-O0`, it isn't reasonable to expect 
optimal performance to begin with.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101209/new/

https://reviews.llvm.org/D101209

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to