[PING, PATCH v5] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-12-10 Thread Cassio Neri
The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementation follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intention is for these additions/subtractions to
be performed in modulus (12 or 7) arithmetic so that no overflow is expected.

To fix these UB, this patch implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

  template 
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + __y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before __modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allow better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Ping.

Good for trunk?

Changes with respect to previous versions:
  v5: Fix typos in commit message.
  v4: Remove UB also from operator-.
  v3: Fix screwed up email send with v2.
  v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from test.

 libstdc++-v3/include/std/chrono  | 79 +---
 libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
 libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
 5 files changed, 151 insertions(+), 27 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index e4ba6eafceb..1b7cdb96e1c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Helper to __add_modulo and __sub_modulo.
+  template 
+  constexpr auto
+  __modulo_offset()
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
+   return _Up(-1) - __b; // >= 255 + d - 1
+  }
+
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month with a shift in [0, d - 1] and __y is a duration count.
+  template 
+  constexpr unsigned
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, 

ping: [PATCH v5] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-12-01 Thread Cassio Neri
ping.

On Sat, 25 Nov 2023 at 13:52, Cassio Neri  wrote:

> The following invoke signed integer overflow (UB) [1]:
>
>   month   + months{MAX} // where MAX is the maximum value of months::rep
>   month   + months{MIN} // where MIN is the maximum value of months::rep
>   month   - months{MIN} // where MIN is the minimum value of months::rep
>   weekday + days  {MAX} // where MAX is the maximum value of days::rep
>   weekday - days  {MIN} // where MIN is the minimum value of days::rep
>
> For the additions to MAX, the crux of the problem is that, in libstdc++,
> months::rep and days::rep are int64_t. Other implementations use int32_t,
> cast
> operands to int64_t and perform arithmetic operations without risk of
> overflowing.
>
> For month + months{MIN}, the implementation follows the Standard's "returns
> clause" and evaluates:
>
>modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);
>
> Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could
> help
> but, unfortunately again, this is not possible for libstdc++.
>
> For the subtraction of MIN, the problem is that -MIN is not representable.
>
> It's fair to say that the intention is for these additions/subtractions to
> be performed in modulus (12 or 7) arithmetic so that no overflow is
> expected.
>
> To fix these UB, this patch implements:
>
>   template 
>   unsigned __add_modulo(unsigned __x, _T __y);
>
>   template 
>   unsigned __sub_modulo(unsigned __x, _T __y);
>
> which respectively, returns the remainder of Euclidean division of, __x +
> __y
> and __x - __y by __d without overflowing. These functions replace
>
>   constexpr unsigned __modulo(long long __n, unsigned __d);
>
> which also calculates the reminder of __n, where __n is the result of the
> addition or subtraction. Hence, these operations might invoke UB before
> __modulo
> is called and thus, __modulo can't do anything to remediate the issue.
>
> In addition to solve the UB issues, __add_modulo and __sub_modulo allow
> better
> codegen (shorter and branchless) on x86-64 and ARM [2].
>
> [1] https://godbolt.org/z/a9YfWdn57
> [2] https://godbolt.org/z/Gh36cr7E4
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono: Fix + and - for months and weekdays.
> * testsuite/std/time/month/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/month/2.cc: New test for extreme values.
> * testsuite/std/time/weekday/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/weekday/2.cc: New test for extreme values.
> ---
> Good for trunk?
>
> Changes with respect to previous versions:
>   v5: Fix typos in commit message.
>   v4: Remove UB also from operator-.
>   v3: Fix screwed up email send with v2.
>   v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from
> test.
>
>  libstdc++-v3/include/std/chrono  | 79 +---
>  libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
>  libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
>  libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
>  libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
>  5 files changed, 151 insertions(+), 27 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
>  create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index e4ba6eafceb..1b7cdb96e1c 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  namespace __detail
>  {
> -  // Compute the remainder of the Euclidean division of __n divided
> by __d.
> -  // Euclidean division truncates toward negative infinity and always
> -  // produces a remainder in the range of [0,__d-1] (whereas standard
> -  // division truncates toward zero and yields a nonpositive remainder
> -  // for negative __n).
> +  // Helper to __add_modulo and __sub_modulo.
> +  template 
> +  constexpr auto
> +  __modulo_offset()
> +  {
> +   using _Up = make_unsigned_t<_Tp>;
> +   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
> +   auto constexpr __b = _Up(__d * (__a / __d) - 1);
> +   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
> +   return _Up(-1) - __b; // >= 255 + d - 1
> +  }
> +
> +  // Compute the remainder of the Euclidean division of __x + __y
> divided by
> +  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
> +  // week

[PATCH v5] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-11-25 Thread Cassio Neri
The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementation follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intention is for these additions/subtractions to
be performed in modulus (12 or 7) arithmetic so that no overflow is expected.

To fix these UB, this patch implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

  template 
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + __y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before __modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allow better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---
Good for trunk?

Changes with respect to previous versions:
  v5: Fix typos in commit message.
  v4: Remove UB also from operator-.
  v3: Fix screwed up email send with v2.
  v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from test.

 libstdc++-v3/include/std/chrono  | 79 +---
 libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
 libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
 5 files changed, 151 insertions(+), 27 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index e4ba6eafceb..1b7cdb96e1c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Helper to __add_modulo and __sub_modulo.
+  template 
+  constexpr auto
+  __modulo_offset()
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
+   return _Up(-1) - __b; // >= 255 + d - 1
+  }
+
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month with a shift in [0, d - 1] and __y is a duration count.
+  template 
+  constexpr unsigned
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != 

[PATCH v4] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-11-20 Thread Cassio Neri
The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementations follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intentions are for these additions/subtractions to
be in modulus (12 or 7) arithmetic so that no overflow is expected.

To fix these UB, this patch implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

  template 
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + __y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before __modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allows better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Changes with respect to previous versions:
  v4: Remove UB also from operator-.
  v3: Fix screwed up email send with v2.
  v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from test.

 libstdc++-v3/include/std/chrono  | 79 +---
 libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
 libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
 5 files changed, 151 insertions(+), 27 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index e4ba6eafceb..1b7cdb96e1c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Helper to __add_modulo and __sub_modulo.
+  template 
+  constexpr auto
+  __modulo_offset()
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
+   return _Up(-1) - __b; // >= 255 + d - 1
+  }
+
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month with a shift in [0, d - 1] and __y is a duration count.
+  template 
+  constexpr unsigned
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we subtract from 

Re: [PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-18 Thread Cassio Neri
Actually, disregard this patch. I'm preparing a better one which also
tackles UB in

month - months{INT_MIN}
weekday - days{INT_MIN}

Best wishes,
Cassio.

On Sat, 18 Nov 2023, 00:19 Cassio Neri,  wrote:

> The following functions invoke signed integer overflow (UB) for some
> extreme
> values of days and months [1]:
>
>   weekday operator+(const weekday& x, const days& y); // #1
>   month operator+(const month& x, const months& y);   // #2
>
> For #1 the problem is that in libstdc++ days::rep is int64_t. Other
> implementations use int32_t and cast operands to int64_t. Hence then
> perform
> arithmetic operations without fear of overflowing. For instance, #1
> evaluates:
>
>   modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);
>
> For x86-64, long long is int64 so the cast is useless.  For #2, casting to
> a
> larger type could help but all implementations follow the Standard's
> "Returns
> clause" and evaluate:
>
>modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);
>
> Hence, overflow occurs when __y.count() is the minimum value of its type.
> When
> long long is larger than months::rep, this is a fix:
>
>modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);
>
> Again, this is not possible for libstdc++.  The fix uses this new function:
>
>   template 
>   unsigned __add_modulo(unsigned __x, _T __y);
>
> which returns the remainder of Euclidean division of __x +__y by __d
> without
> overflowing. This function replaces
>
>   constexpr unsigned __modulo(long long __n, unsigned __d);
>
> In addition to solve the UB issues, __add_modulo allows shorter branchless
> code
> on x86-64 and ARM [2].
>
> [1] https://godbolt.org/z/WqvosbrvG
> [2] https://godbolt.org/z/o63794GEE
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono: Fix operator+ for months and weekdays.
> * testsuite/std/time/month/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/month/2.cc: New test for extreme values.
> * testsuite/std/time/weekday/1.cc: Add constexpr tests against
> overflow.
> * testsuite/std/time/weekday/2.cc: New test for extreme values.
> ---
>
> Changes with respect to previous versions:
>  v3: Fix screwed up email send with v2. (Sorry about that. I shall learn at
>  some point.)
>  v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from
> test.
>
>  libstdc++-v3/include/std/chrono  | 61 
>  libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
>  libstdc++-v3/testsuite/std/time/month/2.cc   | 30 ++
>  libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
>  libstdc++-v3/testsuite/std/time/weekday/2.cc | 30 ++
>  5 files changed, 114 insertions(+), 24 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
>  create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc
>
> diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> index 10bdd1c4ede..691bb106bb9 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  namespace __detail
>  {
> -  // Compute the remainder of the Euclidean division of __n divided
> by __d.
> -  // Euclidean division truncates toward negative infinity and always
> -  // produces a remainder in the range of [0,__d-1] (whereas standard
> -  // division truncates toward zero and yields a nonpositive remainder
> -  // for negative __n).
> +  // Compute the remainder of the Euclidean division of __x + __y
> divided by
> +  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
> +  // weekday/month and an offset in [0, d - 1] and __y is a duration
> count.
> +  // For instance, [time.cal.month.nonmembers] says that given month
> x and
> +  // months y, to get x + y one must calculate:
> +  //
> +  // modulo(static_cast(unsigned{x}) + (y.count() - 1),
> 12) + 1.
> +  //
> +  // Since y.count() is a 64-bits signed value the subtraction
> y.count() - 1
> +  // or the addition of this value with static_cast long>(unsigned{x})
> +  // might overflow.  This function can be used to avoid this problem:
> +  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
> +  // (More details in the implementation of operator+(month, months).)
> +  template 
>constexpr unsigned
> -  __modulo(long long __n, unsigned __d)
> -  {
> -   if (__n >= 0)
> - return __n % __d;
> -   else
> - return (__d + (__n % __d)) %

[PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-17 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme
values of days and months [1]:

  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Changes with respect to previous versions:
 v3: Fix screwed up email send with v2. (Sorry about that. I shall learn at
 some point.)
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 30 ++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 30 ++
 5 files changed, 114 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 

[PATCH v2] The following functions invoke signed integer overflow (UB) for some extreme values of days and months [1]:

2023-11-17 Thread Cassio Neri
  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---
 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 3 files changed, 54 insertions(+), 24 deletions(-)

 Changes with respect to previous versions:
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - _Up(255 + d - 1) and b % d = d - 1.
+   auto const __offset = __y >= 0 ? _Up(0) : __b - _Up(-1);
+   return (__x + _Up(__y) + __offset) % __d;
   }

   inline constexpr unsigned __days_per_month[12]
@@ -700,8 +720,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend constexpr month
   operator+(const month& __x, const 

[PATCH] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-16 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme
values of days and months [1]:

  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1, the crux of the problem is that, in libstdc++, days::rep is int64_t.
Other implementations use int32_t and cast operands to int64_t and perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

Sadly, libstdc++ doesn't have this luxury.  For #2, casting to a larger type
could help but all implementations follow the Standard's "Returns clause"
and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++. To fix these UB, this patch
implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing.  This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder but takes the sum __n as argument at which
point the overflow might have already occurred.

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

If desirable, I think I'm able to do something similar for operator-(x, y)
(month/weekday x and months/days y) which is specified as:
Returns: x + -y;
All implementations follow the above and -y overflows when y has the minimum
value of its type.

 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 47 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 47 +++
 5 files changed, 148 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..02087a9374c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _T __y)
+  {
+   using _U = make_unsigned_t<_T>;
+   // For __y >= 0, _U(__y) has the same mathematical value as __y and this
+   // function simply returns (__x + _U(__y)) % d.  Typically, this doesn't
+   // overflow since the range of _U contains many more positive values
+   // than _T's.  For __y < 0, _U(__y) has a mathematical value in the
+   // upper-half range of _U so that adding a positive value to it might
+   // overflow.  Moreover, most likely, _U(__y) != __y mod d.  To fix both
+   // issues we "subtract" from _U(__y) an __offset 

[PATCH] libstdc++: Improve operator-(weekday x, weekday y).

2023-11-13 Thread Cassio Neri
The current implementation calls __detail::__modulo which is relatively
expensive.

A better implementation is possible if we assume that x.ok() && y.ok() == true,
so that n = x.c_encoding() - y.c_encoding() is in [-6, 6]. In this case, it
suffices to return n >= 0 ? n : n + 7.

The above is allowed by [time.cal.wd.nonmembers]/5: the returned value is
unspecified when x.ok() || y.ok() == false.

The assembly emitted for x86-64 and ARM can be seen in:
https://godbolt.org/z/nMdc5vv9n.

libstdc++-v3/ChangeLog:

* include/std/chrono:
---

OK for trunk?

 libstdc++-v3/include/std/chrono | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10e868e5a03..6131e7e97b3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1036,8 +1036,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend constexpr days
   operator-(const weekday& __x, const weekday& __y) noexcept
   {
-   auto __n = static_cast(__x._M_wd) - __y._M_wd;
-   return days{__detail::__modulo(__n, 7)};
+   const auto __n = __x.c_encoding() - __y.c_encoding();
+   return static_cast(__n) >= 0 ? days{__n} : days{__n + 7};
   }
 };

--
2.41.0



[PATCH] Fix UB in weekday::weekday(sys_days) and add test.

2023-11-11 Thread Cassio Neri
The following has undefined behaviour (signed overflow) [1]:
weekday max{sys_days{days{numeric_limits::max()}}};

The issue is in this line when __n is very large and __n + 4 overflows:
return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6);

In addition to fixing this bug, the new implementation makes the compiler emit
shorter and branchless code for x86-64 and ARM [2].

[1] https://godbolt.org/z/1s5bv7KfT
[2] https://godbolt.org/z/zKsabzrhs

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix weekday::_S_from_days
* testsuite/std/time/weekday/1.cc: Add test for overflow.
---

Good for trunk?

 libstdc++-v3/include/std/chrono  | 11 +--
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  9 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10e868e5a03..c00dd133173 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -930,8 +930,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr weekday
   _S_from_days(const days& __d)
   {
-   auto __n = __d.count();
-   return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6);
+   using _Rep = days::rep;
+   using _URep = make_unsigned_t<_Rep>;
+   const auto __n = __d.count();
+   const auto __m = static_cast<_URep>(__n);
+
+   // 1970-01-01 (__n =  0, __m = 0) -> Thursday (4)
+   // 1969-31-12 (__n = -1, __m = _URep(-1)) -> Wednesday (3)
+   const auto __offset = __n >= 0 ? _URep(4) : 3 - _URep(-1) % 7 - 7;
+   return weekday((__m + __offset) % 7);
   }

 public:
diff --git a/libstdc++-v3/testsuite/std/time/weekday/1.cc 
b/libstdc++-v3/testsuite/std/time/weekday/1.cc
index 00278c8b01c..e89fca47d4b 100644
--- a/libstdc++-v3/testsuite/std/time/weekday/1.cc
+++ b/libstdc++-v3/testsuite/std/time/weekday/1.cc
@@ -20,6 +20,7 @@
 // Class template day [time.cal.weekday]

 #include 
+#include 

 constexpr void
 constexpr_weekday()
@@ -37,6 +38,14 @@ constexpr_weekday()
   static_assert(weekday{3}[2].weekday() == weekday{3});
   static_assert(weekday{3}[last].weekday() == weekday{3});

+  // Test for UB (overflow).
+  {
+using rep = days::rep;
+using std::numeric_limits;
+constexpr weekday max{sys_days{days{numeric_limits::max()}}};
+constexpr weekday min{sys_days{days{numeric_limits::min()}}};
+  }
+
   static_assert(weekday{sys_days{1900y/January/1}} == Monday);
   static_assert(weekday{sys_days{1970y/January/1}} == Thursday);
   static_assert(weekday{sys_days{2020y/August/21}} == Friday);
--
2.41.0



[PATCH v3] libstdc++: Simplify year::is_leap().

2023-11-11 Thread Cassio Neri
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's correct to replace the divisibility check by
100 with the one by 25. It turns out that _M_y % 25 == 0 also saves the ror
instruction [2]. Therefore, the obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono: Clear code.
---

Previous versions of this patch failed to apply. Hope this one works.

Good for trunk?

 libstdc++-v3/include/std/chrono | 40 -
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10e868e5a03..e18f57229e8 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -835,29 +835,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   is_leap() const noexcept
   {
-   // Testing divisibility by 100 first gives better performance, that is,
-   // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
-
-   // It gets even faster if _M_y is in [-536870800, 536870999]
-   // (which is the case here) and _M_y % 100 is replaced by
-   // __is_multiple_of_100 below.
+   // Testing divisibility by 100 first gives better performance [1], i.e.,
+   // return _M_y % 100 == 0 ? _M_y % 400 == 0 : _M_y % 16 == 0;
+   // Furthermore, if _M_y % 100 == 0, then _M_y % 400 == 0 is equivalent
+   // to _M_y % 16 == 0, so we can simplify it to
+   // return _M_y % 100 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0.  // #1
+   // Similarly, we can replace 100 with 25 (which is good since
+   // _M_y % 25 == 0 requires one fewer instruction than _M_y % 100 == 0
+   // [2]):
+   // return _M_y % 25 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0.  // #2
+   // Indeed, first assume _M_y % 4 != 0.  Then _M_y % 16 != 0 and hence,
+   // _M_y % 4 == 0 and _M_y % 16 == 0 are both false.  Therefore, #2
+   // returns false as it should (regardless of _M_y % 25.) Now assume
+   // _M_y % 4 == 0.  In this case, _M_y % 25 == 0 if, and only if,
+   // _M_y % 100 == 0, that is, #1 and #2 are equivalent.  Finally, #2 is
+   // equivalent to
+   // return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0.

// References:
// [1] https://github.com/cassioneri/calendar
-   // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
-
-   // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
-   // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
-   // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
-   // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
-
-   constexpr uint32_t __multiplier   = 42949673;
-   constexpr uint32_t __bound= 42949669;
-   constexpr uint32_t __max_dividend = 1073741799;
-   constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
-   const bool __is_multiple_of_100
- = __multiplier * (_M_y + __offset) < __bound;
-   return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
+   // [2] https://godbolt.org/z/55G8rn77e
+   // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
+   return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0;
   }

   explicit constexpr
--
2.41.0



[PATCH v4] libstdc++: Remove unnecessary "& 1" from year_month_day_last::day().

2023-11-11 Thread Cassio Neri
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono: Remove &1 from year_month_day_last::day().
---
Previous versions of this patch failed to apply. I hope it works this time.

OK for trunk?

 libstdc++-v3/include/std/chrono | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10e868e5a03..a826982803b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1800,22 +1800,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
const auto __m = static_cast(month());

-   // Excluding February, the last day of month __m is either 30 or 31 or,
-   // in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+   // The result is unspecified if __m < 1 or __m > 12.  Hence, assume
+   // 1 <= __m <= 12.  For __m != 2, day() == 30 or day() == 31 or, in
+   // other words, day () == 30 | b, where b is in {0, 1}.

-   // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
-   // Hence, b = __m & 1 = (__m ^ 0) & 1.
+   // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if, __m is
+   // odd.  Hence, b = __m & 1 = (__m ^ 0) & 1.

-   // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if __m is 
even.
-   // Hence, b = (__m ^ 1) & 1.
+   // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if, __m is
+   // even.  Hence, b = (__m ^ 1) & 1.

// Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
// __m >= 8, that is, c = __m >> 3.

-   // The above mathematically justifies this implementation whose
-   // performance does not depend on look-up tables being on the L1 cache.
-   return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-   : _M_y.is_leap() ? 29 : 28};
+   // Since 30 = (0)_2 and __m <= 31 = (1)_2, the "& 1" in b's
+   // calculation is unnecessary.
+
+   // The performance of this implementation does not depend on look-up
+   // tables being on the L1 cache.
+   return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+ : _M_y.is_leap() ? 29 : 28};
   }

   constexpr
--
2.41.0



[PATCH] libstdc++: Remove unnecessary "& 1" from year_month_day_last::day().

2023-11-11 Thread Cassio Neri
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono: Remove &1 from year_month_day_last::day().
---

Previous versions of this patch failed to apply. I hope it works this time.

OK for trunk?

 libstdc++-v3/include/std/chrono | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10e868e5a03..a826982803b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1800,22 +1800,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 const auto __m = static_cast(month());

-// Excluding February, the last day of month __m is either 30 or 31 or,
-// in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+// The result is unspecified if __m < 1 or __m > 12.  Hence, assume
+// 1 <= __m <= 12.  For __m != 2, day() == 30 or day() == 31 or, in
+// other words, day () == 30 | b, where b is in {0, 1}.

-// If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
-// Hence, b = __m & 1 = (__m ^ 0) & 1.
+// If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if, __m is
+// odd.  Hence, b = __m & 1 = (__m ^ 0) & 1.

-// If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if __m is even.
-// Hence, b = (__m ^ 1) & 1.
+// If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if, __m is
+// even.  Hence, b = (__m ^ 1) & 1.

 // Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
 // __m >= 8, that is, c = __m >> 3.

-// The above mathematically justifies this implementation whose
-// performance does not depend on look-up tables being on the L1 cache.
-return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-: _M_y.is_leap() ? 29 : 28};
+// Since 30 = (0)_2 and __m <= 31 = (1)_2, the "& 1" in b's
+// calculation is unnecessary.
+
+// The performance of this implementation does not depend on look-up
+// tables being on the L1 cache.
+return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+  : _M_y.is_leap() ? 29 : 28};
   }

   constexpr
--
2.41.0


[PATCH v2] Simplify year::is_leap().

2023-11-10 Thread Cassio Neri
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's mathematically correct to replace the
divisibility check by 100 with the one by 25. It turns out that
_M_y % 25 == 0 also saves the ror instruction [2]. Therefore, the
obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono:

---
 libstdc++-v3/include/std/chrono | 38 ++
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..5707ed002a2 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -835,29 +835,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   is_leap() const noexcept
   {
- // Testing divisibility by 100 first gives better performance, that is,
- // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
-
- // It gets even faster if _M_y is in [-536870800, 536870999]
- // (which is the case here) and _M_y % 100 is replaced by
- // __is_multiple_of_100 below.
+ // Testing divisibility by 100 first gives better performance [1], i.e.,
+ // return y % 100 == 0 ? y % 400 == 0 : y % 16 == 0;
+ // Furthermore, if y % 100 == 0, then y % 400 == 0 is equivalent to
+ // y % 16 == 0, so we can simplify it to
+ // return y % 100 == 0 ? y % 16 == 0 : y % 4 == 0.  // #1
+ // Similarly, we can replace 100 with 25 (which is good since
+ // y % 25 == 0 requires one fewer instruction than y % 100 == 0 [2]):
+ // return y % 25 == 0 ? y % 16 == 0 : y % 4 == 0.  // #2
+ // Indeed, first assume y % 4 != 0.  Then y % 16 != 0 and hence,
+ // y % 4 == 0 and y % 16 == 0 are both false.  Therefore, #2 returns
+ // false as it should (regardless of y % 25.) Now assume y % 4 == 0.  In
+ // this case, y % 25 == 0 if, and only if, y % 100 == 0, that is, #1 and
+ // #2 are equivalent.  Finally, #2 is equivalent to
+ // return (y & (y % 25 == 0 ? 15 : 3)) == 0.

  // References:
  // [1] https://github.com/cassioneri/calendar
- // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
-
- // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
- // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
- // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
- // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
-
- constexpr uint32_t __multiplier   = 42949673;
- constexpr uint32_t __bound= 42949669;
- constexpr uint32_t __max_dividend = 1073741799;
- constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
- const bool __is_multiple_of_100
-  = __multiplier * (_M_y + __offset) < __bound;
- return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
+ // [2] https://godbolt.org/z/55G8rn77e
+ // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
+ return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0;
   }

   explicit constexpr


[PATCH v2] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-10 Thread Cassio Neri
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono:

---
 libstdc++-v3/include/std/chrono | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..a826982803b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1800,22 +1800,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
  const auto __m = static_cast(month());

- // Excluding February, the last day of month __m is either 30 or 31 or,
- // in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+ // The result is unspecified if __m < 1 or __m > 12.  Hence, assume
+ // 1 <= __m <= 12.  For __m != 2, day() == 30 or day() == 31 or, in
+ // other words, day () == 30 | b, where b is in {0, 1}.

- // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
- // Hence, b = __m & 1 = (__m ^ 0) & 1.
+ // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if, __m is
+ // odd.  Hence, b = __m & 1 = (__m ^ 0) & 1.

- // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if __m is even.
- // Hence, b = (__m ^ 1) & 1.
+ // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if, __m is
+ // even.  Hence, b = (__m ^ 1) & 1.

  // Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
  // __m >= 8, that is, c = __m >> 3.

- // The above mathematically justifies this implementation whose
- // performance does not depend on look-up tables being on the L1 cache.
- return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-: _M_y.is_leap() ? 29 : 28};
+ // Since 30 = (0)_2 and __m <= 31 = (1)_2, the "& 1" in b's
+ // calculation is unnecessary.
+
+ // The performance of this implementation does not depend on look-up
+ // tables being on the L1 cache.
+ return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+  : _M_y.is_leap() ? 29 : 28};
   }

   constexpr


[PATCH] Simplify year::is_leap().

2023-11-05 Thread Cassio Neri
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's mathematically correct to replace the
divisibility check by 100 with the one by 25. It turns out that
_M_y % 25 == 0 also saves the ror instruction [2]. Therefore, the
obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..a34b3977d59 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -835,29 +835,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   is_leap() const noexcept
   {
- // Testing divisibility by 100 first gives better performance, that is,
- // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
-
- // It gets even faster if _M_y is in [-536870800, 536870999]
- // (which is the case here) and _M_y % 100 is replaced by
- // __is_multiple_of_100 below.
+ // Testing divisibility by 100 first gives better performance [1], i.e.,
+ // return y % 100 == 0 ? y % 400 == 0 : y % 16 == 0;
+ // Furthermore, if y % 100 == 0, then y % 400 == 0 is equivalent to
+ // y % 16 == 0, so we can simplify it to
+ // return y % 100 == 0 ? y % 16 == 0 : y % 4 == 0. // #1
+ // Similarly, we can replace 100 with 25 (which is good since y % 25 == 0
+ // requires one fewer instruction than y % 100 == 0 [2]):
+ // return y % 25 == 0 ? y % 16 == 0 : y % 4 == 0. // #2
+ // Indeed, first assume y % 4 != 0. Then y % 16 != 0 and hence, y % 4 == 0
+ // and y % 16 == 0 are both false. Therefore, #2 returns false as it
+ // should (regardless of y % 25.) Now assume y % 4 == 0. In this case,
+ // y % 25 == 0 if, and only if, y % 100 == 0, that is, #1 and #2 are
+ // equivalent. Finally, #2 is equivalent to
+ // return (y & (y % 25 == 0 ? 15 : 3)) == 0.

  // References:
  // [1] https://github.com/cassioneri/calendar
- // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
-
- // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
- // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
- // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
- // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
-
- constexpr uint32_t __multiplier   = 42949673;
- constexpr uint32_t __bound= 42949669;
- constexpr uint32_t __max_dividend = 1073741799;
- constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
- const bool __is_multiple_of_100
-  = __multiplier * (_M_y + __offset) < __bound;
- return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
+ // [2] https://godbolt.org/z/55G8rn77e
+ // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
+ return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0;
   }

   explicit constexpr


Re: [PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Cassio Neri
I could not find any entry in gcc's bugzilla for that. Perhaps my search
wasn't good enough.


On Sun, 5 Nov 2023 at 15:58, Marc Glisse  wrote:

> On Sun, 5 Nov 2023, Cassio Neri wrote:
>
> > When year_month_day_last::day() was implemented, Dr. Matthias Kretz
> realised
> > that the operation "& 1" wasn't necessary but we did not patch it at that
> > time. This patch removes the unnecessary operation.
>
> Is there an entry in gcc's bugzilla about having the optimizer handle this
> kind of optimization?
>
> unsigned f(unsigned x){
>if(x>=32)__builtin_unreachable();
>return 30|(x&1); // --> 30|x
> }
>
> (that optimization would come in addition to your patch, doing the
> optimization by hand is still a good idea)
>
> It looks like the criterion would be a|(b) when the possible 1 bits of b
> are included in the certainly 1 bits of a|c.
>
> --
> Marc Glisse
>


[PATCH] Remove unnecessary "& 1" in year_month_day_last::day()

2023-11-05 Thread Cassio Neri
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono
b/libstdc++-v3/include/std/chrono
index 10e868e5a03..c979a5d05dd 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1800,8 +1800,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
  const auto __m = static_cast(month());

- // Excluding February, the last day of month __m is either 30 or 31 or,
- // in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+ // Assume 1 <= __m <= 12, otherwise month().ok() == false and the result
+ // of day() is unspecified. Excluding February, the last day of month __m
+ // m is either 30 or 31 or, in another words, it is 30 | b, where b is in
+ // {0, 1}.

  // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
  // Hence, b = __m & 1 = (__m ^ 0) & 1.
@@ -1812,10 +1814,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
  // __m >= 8, that is, c = __m >> 3.

+ // Since 30 = (0)_2 and __m <= 31 = (1)_2, we have:
+ // 30 | ((__m ^ c) & 1) == 30 | (__m ^ c), that is, the "& 1" is
+ // unnecessary.
+
  // The above mathematically justifies this implementation whose
  // performance does not depend on look-up tables being on the L1 cache.
- return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-: _M_y.is_leap() ? 29 : 28};
+ return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+ : _M_y.is_leap() ? 29 : 28};
   }

   constexpr


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-24 Thread Cassio Neri via Gcc-patches
Thanks Jonathan.

Here are some benchmarks (assembly in [1]):
https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw

Unfortunately, quick-bench times out unless some implementations are
commented out. You can copy the code and run it locally (needs google
benchmark) to get the full picture.

I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits
really good code and looking at previous versions this has been very
consistent. This implementation also highlights very clearly that my hack
to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7)
saves a ror instruction with remarkable performance effects.

In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest.
Nevertheless, compared to other fast alternatives, performance differences
are small and results seem very much platform dependent. In my opinion, his
implementation is the best, especially, given recent compiler changes. My
reasoning follows.

My original motivation was contrasting 'year % 400 == 0' with 'year % 16'
as in versions 1 and 2:

  __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1
  __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2

It's fair to expect that version 2 won't be slower than version 1. However,
this is the case and by quite a big margin. The emitted instructions are
very reasonable and, I guess, the issue is the choice of registers. I've
reimplemented half of version 2 in inline asm using similar register
choices as version 1 and got the performance boost that I was expecting.
(By no means I'm suggesting a non portable implementation here. It was just
an exercise to make my point. Also, it performs very poorly when compiled
by clang.)

The point here is that code emitted for sources like versions 1 and 2
(involving %) seems sensitive to very small variations and has been
changing across compiler releases in the last 3 years or so. (This can be
checked on godbolt.) Clang also seems very confused [2]. Even if the
emitted code for version 2 and others were good today, I fear a new
compiler release could regress. The stability of the code emitted for
Ulrich Drepper's implementation is much more reliable, its performance is
very good and its clarity is only compromised by my own hack for
__is_multiple_of_100. (Recall, however, that this hack is crucial for his
implementation's performance.)

Best wishes,
Cassio.

[1] https://godbolt.org/z/TbGYEqx33
[2] https://bugs.llvm.org/show_bug.cgi?id=50334

On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely  wrote:

> On 23/06/21 18:51 +0100, Jonathan Wakely wrote:
> >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
> >Pushed to trunk.
> >
> >
> >
>
> >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
> >Author: Cassio Neri 
> >Date:   Wed Jun 23 15:32:16 2021
> >
> >libstdc++: More efficient std::chrono::year::leap
> >
> >Simple change to std::chrono::year::is_leap. If a year is multiple of
> 100,
> >then it's divisible by 400 if and only if it's divisible by 16. The
> latter
> >allows for better code generation.
> >
> >The expression is then either y%16 or y%4 which are both powers of two
> >and so it can be rearranged to use simple bitmask operations.
> >
> >Co-authored-by: Jonathan Wakely 
> >Co-authored-by: Ulrich Drepper 
> >
> >libstdc++-v3/ChangeLog:
> >
> >* include/std/chrono (chrono::year::is_leap()): Optimize.
> >
> >diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> >index 4631a727d73..863b6a27bdf 100644
> >--- a/libstdc++-v3/include/std/chrono
> >+++ b/libstdc++-v3/include/std/chrono
> >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   // [1] https://github.com/cassioneri/calendar
> >   // [2]
> https://accu.org/journals/overload/28/155/overload155.pdf#page=16
> >
> >+  // Furthermore, if y%100 != 0, then y%400==0 is equivalent to
> y%16==0,
> >+  // so we can rearrange the expression to (mult_100 ? y % 4 : y %
> 16)==0
>
> But Ulrich pointed out I got my boolean logic all muddled up in the
> comment. Fixed with the attached patch!
>
>


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr

On Fri, May 21, 2021 at 7:05 PM Koning, Paul  wrote:
>
>
>
> > On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches 
> >  wrote:
> >
> > Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
> > then it's divisible by 400 if and only if it's divisible by 16. The latter
> > allows for better code generation.
>
> I wonder if the optimizer could be taught to do that.
>
> The change seems correct but it is very confusing; at the very least the 
> reasoning you gave should be stated in a comment on that check.
>
> paul
>
>
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

	* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	// Usually we test _M_y % 400 == 0 but, when it's already known that
+	// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+	return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


[PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..6221d02c1b3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


Re: [PATCH 1/4] libstdc++: More efficient date from days.

2021-02-25 Thread Cassio Neri via Gcc-patches
Hi Jonathan,

The issue is that I didn't cast __dp.count() to uint32_t:

-  const auto __r0 = __dp.count() + __r2_e3;
+  const auto __r0 = static_cast(__dp.count()) + __r2_e3;

The above would be a better fix. Indeed, __r0 belongs to [0, 2^32[ which allows
all arithmetics that follow to be performed on uint32_t values. For performance
this is better than using signed integers.

I wrongly assumed __dp.count() was int32_t which would make __r0 also uint32_t.
It turns out, it is int64_t so are __r0 and other expressions including
__y1 + __z2. Hence, we're losing a bit of performance. I'm glad the compiler
warned us. (Although I don't understand why I didn't get the warning.)

Thanks,
Cassio.

On Thu, Feb 25, 2021 at 11:56 AM Jonathan Wakely  wrote:
>
> On 24/02/21 17:28 +, Jonathan Wakely wrote:
> >On 23/02/21 13:24 +0000, Cassio Neri via Libstdc++ wrote:
> >>This patch reimplements std::chrono::year_month_day::_S_from_days() which
> >>retrieves a date from the number of elapsed days since 1970/01/01.  The new
> >>implementation is based on Proposition 6.3 of Neri and Schneider, "Euclidean
> >>Affine Functions and Applications to Calendar Algorithms" available at
> >>https://arxiv.org/abs/2102.06959.
> >>
> >>The aforementioned paper benchmarks the implementation against several
> >>counterparts, including libc++'s (which is identical to the current
> >>implementation).  The results, shown in Figure 4, indicate the new 
> >>algorithm is
> >>2.2 times faster than the current one.
> >>
> >>The patch adds a test which loops through all integers in [-12687428, 
> >>11248737],
> >>and for each of them, gets the corresponding date and compares the result
> >>against its expected value.  The latter is calculated using a much simpler 
> >>and
> >>easy to understand algorithm but which is also much slower.
> >>
> >>The interval used in the test covers the full range of values for which a
> >>roundtrip must work [time.cal.ymd.members].  Despite its completeness the 
> >>test
> >>runs in a matter of seconds.
> >>
> >>libstdc++-v3/ChangeLog:
> >>
> >>   * include/std/chrono:
> >>   * testsuite/std/time/year_month_day/3.cc: New test.
> >
> >Thanks! I'm committing this to trunk (it only changes new C++20
> >material so OK during stage 4 ... and anyway it's both faster and
> >better tested than the old code).
> >
> >I've tweaked it slightly to keep some lines below 80 columns, but no
> >changes except whitespace.
>
> I've made the attached tweak, to avoid a narrowing conversion from
> long to int. Tested x86_64-linux, committed to trunk.
>


[PATCH 4/4] libstdc++: More efficient last day of month.

2021-02-23 Thread Cassio Neri via Gcc-patches
This patch reimplements std::chrono::year_month_day_last:day() which yields the
last day of a particular month.  The current implementation uses a look-up table
implemented as an unsigned[12] array.  The new implementation instead
is based on
the fact that a month m in [1, 12], except for m == 2 (February), is
either 31 or
30 days long and m's length depends on two things: m's parity and whether m >= 8
or not. These two conditions are determined by the 0th and 3th bit of m and,
therefore, cheap and straightforward bit-twiddling can provide the right result.

Measurements in x86_64 [1] suggest a 10% performance boost.  Although this does
not seem to be huge, notice that measurements are done in hot L1 cache
conditions which might not be very representative of production runs. Also
freeing L1 cache from holding the look-up table might allow performance
improvements elsewhere.

References:
[1] https://github.com/cassioneri/calendar

libstdc++-v3/ChangeLog:

* include/std/chrono:
---
 libstdc++-v3/include/std/chrono | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7840099d743..35a7a5e4382 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1269,9 +1269,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

   inline constexpr unsigned __days_per_month[12]
 = { 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
-
-  inline constexpr unsigned __last_day[12]
-= { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
 }

 // DAY
@@ -2526,9 +2523,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr chrono::day
   day() const noexcept
   {
-if (!_M_mdl.ok() || (month() == February && _M_y.is_leap()))
-  return chrono::day{29};
-return chrono::day{__detail::__last_day[unsigned(month()) - 1]};
+const auto __m = static_cast(month());
+
+// Excluding February, the last day of month __m is either 30 or 31 or,
+// in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+
+// If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if
__m is odd.
+// Hence, b = __m & 1 = (__m ^ 0) & 1.
+
+// If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if
__m is even.
+// Hence, b = (__m ^ 1) & 1.
+
+// Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
+// __m >= 8, that is, c = __m >> 3.
+
+// The above mathematically justifies this implementation whose
+// performance does not depend on look-up tables being on the L1 cache.
+return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30 :
_M_y.is_leap() ? 29 : 28};
   }

   constexpr
-- 
2.29.2


[PATCH 3/4] libstdc++: More efficient is_leap.

2021-02-23 Thread Cassio Neri via Gcc-patches
This patch reimplements std::chrono::year::is_leap().  Leap year check is
ubiquitously implemented (including here) as:

y % 4 == 0 && (y % 100 != 0 || y % 400 == 0).

The rationale being that testing divisibility by 4 first implies an earlier
return for 75% of the cases, therefore, avoiding the needless calculations of
y % 100 and y % 400. Although this fact is true, it does not take into account
the cost of branching.  This patch, instead, tests divisibility by 100 first:

(y % 100 != 0 || y % 400 == 0) && y % 4 == 0.

It is certainly counterintuitive that this could be more efficient since among
the three divisibility tests (4, 100 and 400) the one by 100 is the only one
that can never provide a definitive answer and a second divisibility test (by 4
or 400) is always required. However, measurements [1] in x86_64 suggest this is
3x more efficient!  A possible explanation is that checking divisibility by 100
first implies a split in the execution path with probabilities of (1%, 99%)
rather than (25%, 75%) when divisibility by 4 is checked first.  This decreases
the entropy of the branching distribution which seems to help prediction.

Given that y belongs to [-32767, 32767] [time.cal.year.members], a more
efficient algorithm [2] to check divisibility by 100 is used (instead of
y % 100 != 0).  Measurements suggest that this optimization improves performance
by 20%.

The patch adds a test that exhaustively compares the result of this
implementation with the ubiquitous one for all y in [-32767, 32767]. Although
its completeness, the test completes in a matter of seconds.

References:
[1] https://stackoverflow.com/a/60646967/1137388
[2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16

libstdc++-v3/ChangeLog:

* include/std/chrono:
* testsuite/std/time/year/2.cc: New test.
---
 libstdc++-v3/include/std/chrono   | 19 -
 libstdc++-v3/testsuite/std/time/year/2.cc | 52 +++
 2 files changed, 70 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7840099d743..b777acdd4a2 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1597,7 +1597,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

   constexpr bool
   is_leap() const noexcept
-  { return _M_y % 4 == 0 && (_M_y % 100 != 0 || _M_y % 400 == 0); }
+  {
+// Testing divisibility by 100 first gives better performance, that is,
+// return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
+
+// It gets even faster if _M_y is in [-536870800, 536870999]
(which is the case here) and
+// _M_y % 100 is replaced by __is_multiple_of_100 below.
+
+// References:
+// [1] https://github.com/cassioneri/calendar
+// [2]
https://accu.org/journals/overload/28/155/overload155.pdf#page=16
+
+constexpr uint32_t __multiplier   = 42949673;
+constexpr uint32_t __bound= 42949669;
+constexpr uint32_t __max_dividend = 1073741799;
+constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
+const bool __is_multiple_of_100 = __multiplier * (_M_y +
__offset) < __bound;
+return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+  }

   explicit constexpr
   operator int() const noexcept
diff --git a/libstdc++-v3/testsuite/std/time/year/2.cc
b/libstdc++-v3/testsuite/std/time/year/2.cc
new file mode 100644
index 000..e22101e305a
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year/2.cc
@@ -0,0 +1,52 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+// Copyright (C) 2020-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// Class year [time.cal.year_month_day]
+
+#include 
+#include 
+
+// Slow but clear test for leap year.
+constexpr bool
+is_leap_year(const std::chrono::year& y) noexcept
+{
+  const int n = static_cast(y);
+  return n % 4 == 0 && (n % 100 != 0 || n % 400 == 0);
+}
+
+void test01()
+{
+  using namespace std::chrono;
+
+  year y{-32767};
+  while (y < year{32767}) {
+VERIFY( y.is_leap() ==  is_leap_year(y) );
+++y;
+  }
+
+  // One more for y = 32767.
+  VERIFY( 

[PATCH 2/4] libstdc++: More efficient days from date.

2021-02-23 Thread Cassio Neri via Gcc-patches
This patch reimplements std::chrono::year_month_day::_M_days_since_epoch()
which calculates the number of elapsed days since 1970/01/01.  The new
implementation is based on Proposition 6.2 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.

The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation).  The results, shown in Figure 3, indicate the new algorithm is
1.7 times faster than the current one.

The patch adds a test which loops through all dates in [-32767/01/01,
32767/12/31], and for each of them, gets the number of days and compares the
result against its expected value. The latter is calculated using a much
simpler and easy to understand algorithm but which is also much slower.

The dates used in the test covers the full range of possible values
[time.cal.year.members].  Despite its completeness the test runs in matter of
seconds.

libstdc++-v3/ChangeLog:

* include/std/chrono:
* testsuite/std/time/year_month_day/4.cc: New test.
---
 libstdc++-v3/include/std/chrono   | 34 +
 .../testsuite/std/time/year_month_day/4.cc| 71 +++
 2 files changed, 92 insertions(+), 13 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year_month_day/4.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7840099d743..30203540f36 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2447,22 +2447,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 chrono::month(__m), chrono::day(__d)};
 }

-// Days since 1970/01/01. Magic.
+// Days since 1970/01/01.
+// Proposition 6.2 of Neri and Schneider, "Euclidean Affine
Functions and Applications to
+// Calendar Algorithms". https://arxiv.org/abs/2102.06959
 constexpr days
 year_month_day::_M_days_since_epoch() const noexcept
 {
-  const auto __y = static_cast(_M_y) - (_M_m <= February);
-  const auto __m = static_cast(_M_m);
-  const auto __d = static_cast(_M_d);
-  const auto __era = (__y >= 0 ? __y : __y - 399) / 400;
-  // Year of "era" [0, 399].
-  const auto __yoe = static_cast(__y - __era * 400);
-  // Day of year [0, 365].
-  const auto __doy = (153 * (__m > 2 ? __m - 3 : __m + 9) + 2) /
5 + __d - 1;
-  // Day of "era" [0, 146096].
-  const auto __doe = __yoe * 365 + __yoe / 4 - __yoe / 100 + __doy;
-  const auto __days = __era * 146097 + static_cast(__doe) - 719468;
-  return days{__days};
+  auto constexpr __z2= static_cast(-1468000);
+  auto constexpr __r2_e3 = static_cast(536895458);
+
+  const auto __y1 = static_cast(static_cast(_M_y)) - __z2;
+  const auto __m1 = static_cast(_M_m);
+  const auto __d1 = static_cast(_M_d);
+
+  const auto __j  = static_cast(__m1 < 3);
+  const auto __y0 = __y1 - __j;
+  const auto __m0 = __j ? __m1 + 12 : __m1;
+  const auto __d0 = __d1 - 1;
+
+  const auto __q1 = __y0 / 100;
+  const auto __yc = 1461 * __y0 / 4 - __q1 + __q1 / 4;
+  const auto __mc = (979 *__m0 - 2919) / 32;
+  const auto __dc = __d0;
+
+  return days{static_cast(__yc + __mc + __dc - __r2_e3)};
 }

 // YEAR_MONTH_DAY_LAST
diff --git a/libstdc++-v3/testsuite/std/time/year_month_day/4.cc
b/libstdc++-v3/testsuite/std/time/year_month_day/4.cc
new file mode 100644
index 000..2194af17775
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year_month_day/4.cc
@@ -0,0 +1,71 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+// Copyright (C) 2020-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// Class year_month_day [time.cal.year_month_day]
+
+#include 
+#include 
+
+// Slow but very clear way of advancing one day.
+constexpr void
+advance(std::chrono::year_month_day& ymd) noexcept {
+
+  using namespace std::chrono;
+
+  auto y = ymd.year();
+  auto m = ymd.month();
+  auto d = ymd.day();
+
+  if (d != year_month_day_last{year{y}, month_day_last{m}}.day())
+++d;
+  else {
+d = day{1};
+if (m != December)
+  ++m;
+else {
+  m = January;
+  ++y;
+}
+  }
+  

[PATCH 1/4] libstdc++: More efficient date from days.

2021-02-23 Thread Cassio Neri via Gcc-patches
This patch reimplements std::chrono::year_month_day::_S_from_days() which
retrieves a date from the number of elapsed days since 1970/01/01.  The new
implementation is based on Proposition 6.3 of Neri and Schneider, "Euclidean
Affine Functions and Applications to Calendar Algorithms" available at
https://arxiv.org/abs/2102.06959.

The aforementioned paper benchmarks the implementation against several
counterparts, including libc++'s (which is identical to the current
implementation).  The results, shown in Figure 4, indicate the new algorithm is
2.2 times faster than the current one.

The patch adds a test which loops through all integers in [-12687428, 11248737],
and for each of them, gets the corresponding date and compares the result
against its expected value.  The latter is calculated using a much simpler and
easy to understand algorithm but which is also much slower.

The interval used in the test covers the full range of values for which a
roundtrip must work [time.cal.ymd.members].  Despite its completeness the test
runs in a matter of seconds.

libstdc++-v3/ChangeLog:

* include/std/chrono:
* testsuite/std/time/year_month_day/3.cc: New test.
---
 libstdc++-v3/include/std/chrono   | 46 
 .../testsuite/std/time/year_month_day/3.cc| 71 +++
 2 files changed, 104 insertions(+), 13 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/year_month_day/3.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7840099d743..d224762fd3f 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2429,22 +2429,42 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // TODO: Implement operator<<, from_stream.
 };

-// Construct from days since 1970/01/01. Magic.
+// Construct from days since 1970/01/01.
+// Proposition 6.3 of Neri and Schneider, "Euclidean Affine
Functions and Applications to
+// Calendar Algorithms". https://arxiv.org/abs/2102.06959
 constexpr year_month_day
 year_month_day::_S_from_days(const days& __dp) noexcept
 {
-  const auto __z = __dp.count() + 719468;
-  const auto __era = (__z >= 0 ? __z : __z - 146096) / 146097;
-  const auto __doe = static_cast(__z - __era * 146097);
-  const auto __yoe
-= (__doe - __doe / 1460 + __doe / 36524 - __doe / 146096) / 365;
-  const auto __y = static_cast(__yoe) + __era * 400;
-  const auto __doy = __doe - (365 * __yoe + __yoe / 4 - __yoe / 100);
-  const auto __mp = (5 * __doy + 2) / 153;
-  const auto __d = __doy - (153 * __mp + 2) / 5 + 1;
-  const auto __m = __mp < 10 ? __mp + 3 : __mp - 9;
-  return year_month_day{chrono::year(__y + (__m <= 2)),
-chrono::month(__m), chrono::day(__d)};
+  constexpr auto __z2= static_cast(-1468000);
+  constexpr auto __r2_e3 = static_cast(536895458);
+
+  const auto __r0 = __dp.count() + __r2_e3;
+
+  const auto __n1 = 4 * __r0 + 3;
+  const auto __q1 = __n1 / 146097;
+  const auto __r1 = __n1 % 146097 / 4;
+
+  constexpr auto __p32 = static_cast(1) << 32;
+  const auto __n2 = 4 * __r1 + 3;
+  const auto __u2 = static_cast(2939745) * __n2;
+  const auto __q2 = static_cast(__u2 / __p32);
+  const auto __r2 = static_cast(__u2 % __p32) / 2939745 / 4;
+
+  constexpr auto __p16 = static_cast(1) << 16;
+  const auto __n3 = 2141 * __r2 + 197913;
+  const auto __q3 = __n3 / __p16;
+  const auto __r3 = __n3 % __p16 / 2141;
+
+  const auto __y0 = 100 * __q1 + __q2;
+  const auto __m0 = __q3;
+  const auto __d0 = __r3;
+
+  const auto __j  = __r2 >= 306;
+  const auto __y1 = __y0 + __j;
+  const auto __m1 = __j ? __m0 - 12 : __m0;
+  const auto __d1 = __d0 + 1;
+
+  return year_month_day{chrono::year{__y1 + __z2},
chrono::month{__m1}, chrono::day{__d1}};
 }

 // Days since 1970/01/01. Magic.
diff --git a/libstdc++-v3/testsuite/std/time/year_month_day/3.cc
b/libstdc++-v3/testsuite/std/time/year_month_day/3.cc
new file mode 100644
index 000..eede649cd54
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/time/year_month_day/3.cc
@@ -0,0 +1,71 @@
+// { dg-options "-std=gnu++2a" }
+// { dg-do run { target c++2a } }
+
+// Copyright (C) 2020-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file