On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

   * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    constexpr uint32_t __offset       = __max_dividend / 2 / 100 * 100;
    const bool __is_multiple_of_100
      = __multiplier * (_M_y + __offset) < __bound;
-    return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+    // Usually we test _M_y % 400 == 0 but, when it's already known that
+    // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
                  ^^
                  N.B. this comment should say !=

+    return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;

If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

  return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

  return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

  return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither of
these seems to improve compared to my first rearrangement above.


Reply via email to