On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.
(Revised patch below and attached.)
Best wishes,
Cassio.
---
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.
Tested on x86_64-pc-linux-gnu.
libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:
* include/std/chrono:
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100;
const bool __is_multiple_of_100
= __multiplier * (_M_y + __offset) < __bound;
- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+ // Usually we test _M_y % 400 == 0 but, when it's already known that
+ // _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
^^
N.B. this comment should say !=
+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
If y % 16 == 0 then y % 4 == 0 too. So we could write that as:
return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;
This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?
However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.
My rearranged expression above is equivalent to:
return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;
which can be written without branches:
return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;
However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC
(see https://gcc.gnu.org/PR101179 regarding that). But neither of
these seems to improve compared to my first rearrangement above.