nemanjai added inline comments.
================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:15130 + Value *Y = EmitScalarExpr(E->getArg(1)); + auto Ret = Builder.CreateFDiv(X, Y, "recipdiv"); + Builder.setFastMathFlags(FMF); ---------------- bmahjour wrote: > I wonder if we can do better than "fdiv fast"... does the current lowering of > "fdiv fast" employ an estimation algorithm via iterative refinement on POWER? Yes. This `fast` includes `arcp` which will trigger the estimation+refinement algorithm in the back end. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:15134 + } + llvm::Function *F = CGM.getIntrinsic(Intrinsic::sqrt, ResultType); + auto Ret = Builder.CreateCall(F, X); ---------------- bmahjour wrote: > This doesn't implement a reciprocal square root, it just performs a square > root! At the very least we need a divide instruction following the call to > the intrinsic, but I'm not sure if that'll result in the most optimal codegen > at the end. Perhaps we need a new builtin? Oh, I misread the documentation. This really seems like a bizarre thing to offer a user. I will change this to `1/sqrt()`. In terms of providing optimal performance, with fast-math, the optimizer should get rid of the divide. If compiled at `-O0`, it isn't reasonable to expect optimal performance to begin with. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D101209/new/ https://reviews.llvm.org/D101209 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits