On 23/06/21 14:16 +0100, Jonathan Wakely wrote:
On 23/06/21 12:45 +0100, Jonathan Wakely wrote:
On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

 * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset       = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
    = __multiplier * (_M_y + __offset) < __bound;
-    return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+    // Usually we test _M_y % 400 == 0 but, when it's already known that
+    // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
                ^^
                N.B. this comment should say !=

+    return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;

If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither of
these seems to improve compared to my first rearrangement above.

This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

 return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;

Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
Pushed to trunk.



commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
Author: Cassio Neri <cassio.n...@gmail.com>
Date:   Wed Jun 23 15:32:16 2021

    libstdc++: More efficient std::chrono::year::leap
    
    Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
    then it's divisible by 400 if and only if it's divisible by 16. The latter
    allows for better code generation.
    
    The expression is then either y%16 or y%4 which are both powers of two
    and so it can be rearranged to use simple bitmask operations.
    
    Co-authored-by: Jonathan Wakely <jwak...@redhat.com>
    Co-authored-by: Ulrich Drepper <drep...@redhat.com>
    
    libstdc++-v3/ChangeLog:
    
            * include/std/chrono (chrono::year::is_leap()): Optimize.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..863b6a27bdf 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// [1] https://github.com/cassioneri/calendar
 	// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
 
+	// Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
+	// so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0
+	// which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
+	// See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
 	constexpr uint32_t __multiplier   = 42949673;
 	constexpr uint32_t __bound        = 42949669;
 	constexpr uint32_t __max_dividend = 1073741799;
 	constexpr uint32_t __offset       = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
       }
 
       explicit constexpr

Reply via email to