sepavloff added inline comments.
================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:99 + // 0 < x0(b) < 1 + // abs(x0(b) - 1/b) <= 3/4 - 1/sqrt(2) + ---------------- This estimation is absent from the original comment. Do you have reference where it came from? Also the original comment states `This is accurate to about 3.5 binary digits`. Is still true? If yes, it could be worth copying here. ================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:101-102 + + // Then, refine the reciprocal estimate using a Newton-Raphson iteration: + // x_{n+1} = x_n * (2 - x_n * b) + // Let e_n := x_n - 1/b_hw ---------------- The original comment states: ``` // This doubles the number of correct binary digits in the approximation // with each iteration. ``` It is true in this implementation? If yes, it could be worth copying here. ================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:109 + +#if NUMBER_OF_HALF_ITERATIONS > 0 + // Starting with (n-1) half-width iterations ---------------- It is good optimization. Could you please put a comment shortly describing the idea of using half-sized temporaries? ================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:114 + // C is (3/4 + 1/sqrt(2)) - 1 truncated to W0 fractional bits as UQ0.HW +#if defined(SINGLE_PRECISION) + const half_rep_t C_hw = HALF_REP_C(0x7504) << (HW - 16); ---------------- In what cases 16-bit temporaries are used? `NUMBER_OF_HALF_ITERATIONS` is set to zero in `divsf3.c`. ================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:130 + // Cannot overflow by construction and is larger than 2 - b*x by at most 1*Ulp. + half_rep_t corr_UQ1_hw = 0 /* = 2 */ - ((rep_t)x_UQ0_hw * b_UQ1_hw >> HW); + // Would be at most [1.]00000 after overflow if rounding (0 - x_UQ0_hw * b_UQ1_hw) down. ---------------- It would be better to put short comment to explain using 0 instead of 2. ================ Comment at: compiler-rt/lib/builtins/fp_div_impl.inc:184 + // x_UQ0 * b_UQ1 = (x_UQ0_hw * 2^HW) * (b_UQ1_hw * 2^HW + blo) - b_UQ1 + rep_t corr_UQ1 = 0 - ( (rep_t)x_UQ0_hw * b_UQ1_hw + + ((rep_t)x_UQ0_hw * blo >> HW) ---------------- `x_UQ0_hw` and `b_UQ1_hw` are declared inside the conditional block `#if NUMBER_OF_HALF_ITERATIONS > 0`. Does `NUMBER_OF_FULL_ITERATIONS != 1` always imply `NUMBER_OF_HALF_ITERATIONS > 0 `? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D85031/new/ https://reviews.llvm.org/D85031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits