Re: [PATCH] libstdc++: Optimize integer std::from_chars
On Thu, 14 Apr 2022 at 13:32, Patrick Palka via Libstdc++ wrote: > > This applies the following optimizations to the integer std::from_chars > implementation: > > 1. Use a lookup table for converting an alphanumeric digit to its > base-36 value instead of using a range test (for 0-9) and switch > (for a-z and A-Z). The table is constructed using a C++14 > constexpr function which doesn't assume a particular character > encoding or __CHAR_BIT__ value. The new conversion function > __from_chars_alnum_to_val is templated on whether we care > only about the decimal digits, in which case we can perform the > conversion with a single subtraction since the digit characters > are guaranteed to be contiguous (unlike the letters). > 2. Generalize __from_chars_binary to handle all power-of-two bases. > This function, now named __from_chars_pow2_base, is also templated > on whether we care only about the decimal digits in order to speed > up digit conversion for base 2, 4 and 8. > 3. In __from_chars_digit, use >static_cast(__c - '0') < __base > instead of >'0' <= __c && __c <= ('0' + (__base - 1)). > as the digit recognition test (exhaustively verified that the two > tests are equivalent). > 4. In __from_chars_alnum, use a nested loop to consume the rest of the > digits in the overflow case (mirroring __from_chars_digit) so that > the main loop doesn't have to maintain the __valid overflow flag. > > At this point, __from_chars_digit is nearly identical to > __from_chars_alnum, so this patch combines the two functions, removing > the former and templatizing the latter according to whether we care only > about the decimal digits. Finally, > > 5. In __from_chars_alnum, keep track of a lower bound on the number of > unused bits in the result and use that to omit the overflow check > when it's safe to do so. > > In passing this replaces the non-portable function ascii_to_hexit > used by __floating_from_chars_hex with the new conversion function. > > Here are some runtime measurements for a simple 15-line benchmark that > roundtrips printing/parsing 200 million integers via std::to/from_chars > (average of 5 runs): > > Base Before After (seconds, lower is better) > 29.37 9.37 > 3 12.13 15.79 > 83.67 4.15 > 103.86 4.90 > 115.03 6.84 > 162.93 4.14 > 322.39 3.85 > 363.26 5.22 > > Testedon x86_64-pc-linux-gnu, does this look OK for trunk? Also tested > against libc++'s from_chars tests for good measure. > > libstdc++-v3/ChangeLog: > > * include/std/charconv (__from_chars_alnum_to_val_table): Define. > (__from_chars_alnum_to_val): Define. > (__from_chars_binary): Rename to ... > (__from_chars_pow2_base): ... this. Generalize to handle any > power-of-two base using __from_chars_alnum_to_val. > (__from_chars_digit): Optimize digit recognition to a single > test instead of two tests. Use [[__unlikely___]] attribute. > (__from_chars_alpha_to_num): Remove. > (__from_chars_alnum): Use __from_chars_alnum_to_val. Use a > nested loop for the overflow case. > (from_chars): Adjust appropriately. > * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove. > (__floating_from_chars_hex): Use __from_chars_alnum_to_val > to recognize a hex digit instead. > --- > libstdc++-v3/include/std/charconv | 250 -- > libstdc++-v3/src/c++17/floating_from_chars.cc | 18 +- > 2 files changed, 105 insertions(+), 163 deletions(-) > > diff --git a/libstdc++-v3/include/std/charconv > b/libstdc++-v3/include/std/charconv > index 2ce9c7d4cb9..5e44459749a 100644 > --- a/libstdc++-v3/include/std/charconv > +++ b/libstdc++-v3/include/std/charconv > @@ -407,176 +407,127 @@ namespace __detail >return true; > } > > - /// std::from_chars implementation for integers in base 2. > - template > + // Construct and return a lookup table that maps 0-9, A-Z and a-z to the > + // corresponding corresponding base-36 value and maps all other characters > + // to 127. > + constexpr auto > + __from_chars_alnum_to_val_table() > + { > +constexpr unsigned char __lower_letters[] > + = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', > + 'u', 'v', 'w', 'x', 'y', 'z' }; > +constexpr unsigned char __upper_letters[] > + = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > + 'U', 'V', 'W', 'X', 'Y', 'Z' }; > +struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table; > +for (auto& __entry : __table.__data) > + __entry = 127; > +for (int __i = 0; __i < 10; ++__i) > + __table.__data['0' + __i] = __i; > +for (int __i = 0; __i < 26; +
Re: [PATCH] libstdc++: Optimize integer std::from_chars
On Thu, 14 Apr 2022, Patrick Palka wrote: > This applies the following optimizations to the integer std::from_chars > implementation: > > 1. Use a lookup table for converting an alphanumeric digit to its > base-36 value instead of using a range test (for 0-9) and switch > (for a-z and A-Z). The table is constructed using a C++14 > constexpr function which doesn't assume a particular character > encoding or __CHAR_BIT__ value. The new conversion function > __from_chars_alnum_to_val is templated on whether we care > only about the decimal digits, in which case we can perform the > conversion with a single subtraction since the digit characters > are guaranteed to be contiguous (unlike the letters). > 2. Generalize __from_chars_binary to handle all power-of-two bases. > This function, now named __from_chars_pow2_base, is also templated > on whether we care only about the decimal digits in order to speed > up digit conversion for base 2, 4 and 8. > 3. In __from_chars_digit, use >static_cast(__c - '0') < __base > instead of >'0' <= __c && __c <= ('0' + (__base - 1)). > as the digit recognition test (exhaustively verified that the two > tests are equivalent). > 4. In __from_chars_alnum, use a nested loop to consume the rest of the > digits in the overflow case (mirroring __from_chars_digit) so that > the main loop doesn't have to maintain the __valid overflow flag. > > At this point, __from_chars_digit is nearly identical to > __from_chars_alnum, so this patch combines the two functions, removing > the former and templatizing the latter according to whether we care only > about the decimal digits. Finally, > > 5. In __from_chars_alnum, keep track of a lower bound on the number of > unused bits in the result and use that to omit the overflow check > when it's safe to do so. > > In passing this replaces the non-portable function ascii_to_hexit > used by __floating_from_chars_hex with the new conversion function. > > Here are some runtime measurements for a simple 15-line benchmark that > roundtrips printing/parsing 200 million integers via std::to/from_chars > (average of 5 runs): > > Base Before After (seconds, lower is better) > 29.37 9.37 > 3 12.13 15.79 > 83.67 4.15 > 103.86 4.90 > 115.03 6.84 > 162.93 4.14 > 322.39 3.85 > 363.26 5.22 Whoops, the second and third columns should be swapped (runtime is smaller after the patch across the board). > > Testedon x86_64-pc-linux-gnu, does this look OK for trunk? Also tested > against libc++'s from_chars tests for good measure. > > libstdc++-v3/ChangeLog: > > * include/std/charconv (__from_chars_alnum_to_val_table): Define. > (__from_chars_alnum_to_val): Define. > (__from_chars_binary): Rename to ... > (__from_chars_pow2_base): ... this. Generalize to handle any > power-of-two base using __from_chars_alnum_to_val. > (__from_chars_digit): Optimize digit recognition to a single > test instead of two tests. Use [[__unlikely___]] attribute. > (__from_chars_alpha_to_num): Remove. > (__from_chars_alnum): Use __from_chars_alnum_to_val. Use a > nested loop for the overflow case. > (from_chars): Adjust appropriately. > * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove. > (__floating_from_chars_hex): Use __from_chars_alnum_to_val > to recognize a hex digit instead. > --- > libstdc++-v3/include/std/charconv | 250 -- > libstdc++-v3/src/c++17/floating_from_chars.cc | 18 +- > 2 files changed, 105 insertions(+), 163 deletions(-) > > diff --git a/libstdc++-v3/include/std/charconv > b/libstdc++-v3/include/std/charconv > index 2ce9c7d4cb9..5e44459749a 100644 > --- a/libstdc++-v3/include/std/charconv > +++ b/libstdc++-v3/include/std/charconv > @@ -407,176 +407,127 @@ namespace __detail >return true; > } > > - /// std::from_chars implementation for integers in base 2. > - template > + // Construct and return a lookup table that maps 0-9, A-Z and a-z to the > + // corresponding corresponding base-36 value and maps all other characters > + // to 127. > + constexpr auto > + __from_chars_alnum_to_val_table() > + { > +constexpr unsigned char __lower_letters[] > + = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', > + 'u', 'v', 'w', 'x', 'y', 'z' }; > +constexpr unsigned char __upper_letters[] > + = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > + 'U', 'V', 'W', 'X', 'Y', 'Z' }; > +struct { unsigned char __data[1u << __CHAR_BIT__] = {}; } __table; > +for (auto& __entry : __table.__data) > + __entry = 127; > +for (int __i = 0; __i < 10; ++__i) > + __tabl
RE: [PATCH] libstdc++: Optimize integer std::from_chars
> -Original Message- > From: Gcc-patches bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Patrick Palka > via Gcc-patches > Sent: Thursday, April 14, 2022 1:31 PM > To: gcc-patches@gcc.gnu.org > Cc: libstd...@gcc.gnu.org > Subject: [PATCH] libstdc++: Optimize integer std::from_chars > > This applies the following optimizations to the integer std::from_chars > implementation: > > 1. Use a lookup table for converting an alphanumeric digit to its > base-36 value instead of using a range test (for 0-9) and switch > (for a-z and A-Z). The table is constructed using a C++14 > constexpr function which doesn't assume a particular character > encoding or __CHAR_BIT__ value. The new conversion function > __from_chars_alnum_to_val is templated on whether we care > only about the decimal digits, in which case we can perform the > conversion with a single subtraction since the digit characters > are guaranteed to be contiguous (unlike the letters). > 2. Generalize __from_chars_binary to handle all power-of-two bases. > This function, now named __from_chars_pow2_base, is also templated > on whether we care only about the decimal digits in order to speed > up digit conversion for base 2, 4 and 8. > 3. In __from_chars_digit, use >static_cast(__c - '0') < __base > instead of >'0' <= __c && __c <= ('0' + (__base - 1)). > as the digit recognition test (exhaustively verified that the two > tests are equivalent). > 4. In __from_chars_alnum, use a nested loop to consume the rest of the > digits in the overflow case (mirroring __from_chars_digit) so that > the main loop doesn't have to maintain the __valid overflow flag. > > At this point, __from_chars_digit is nearly identical to > __from_chars_alnum, so this patch combines the two functions, removing > the former and templatizing the latter according to whether we care only > about the decimal digits. Finally, > > 5. In __from_chars_alnum, keep track of a lower bound on the number of > unused bits in the result and use that to omit the overflow check > when it's safe to do so. > > In passing this replaces the non-portable function ascii_to_hexit > used by __floating_from_chars_hex with the new conversion function. > > Here are some runtime measurements for a simple 15-line benchmark that > roundtrips printing/parsing 200 million integers via std::to/from_chars > (average of 5 runs): > > Base Before After (seconds, lower is better) > 29.37 9.37 > 3 12.13 15.79 > 83.67 4.15 > 103.86 4.90 > 115.03 6.84 > 162.93 4.14 > 322.39 3.85 > 363.26 5.22 The after numbers look worse (higher)? Are the columns accidentally swapped? Thanks, Kyrill > > Testedon x86_64-pc-linux-gnu, does this look OK for trunk? Also tested > against libc++'s from_chars tests for good measure. > > libstdc++-v3/ChangeLog: > > * include/std/charconv (__from_chars_alnum_to_val_table): Define. > (__from_chars_alnum_to_val): Define. > (__from_chars_binary): Rename to ... > (__from_chars_pow2_base): ... this. Generalize to handle any > power-of-two base using __from_chars_alnum_to_val. > (__from_chars_digit): Optimize digit recognition to a single > test instead of two tests. Use [[__unlikely___]] attribute. > (__from_chars_alpha_to_num): Remove. > (__from_chars_alnum): Use __from_chars_alnum_to_val. Use a > nested loop for the overflow case. > (from_chars): Adjust appropriately. > * src/c++17/floating_from_chars.cc (ascii_to_hexit): Remove. > (__floating_from_chars_hex): Use __from_chars_alnum_to_val > to recognize a hex digit instead. > --- > libstdc++-v3/include/std/charconv | 250 -- > libstdc++-v3/src/c++17/floating_from_chars.cc | 18 +- > 2 files changed, 105 insertions(+), 163 deletions(-) > > diff --git a/libstdc++-v3/include/std/charconv b/libstdc++- > v3/include/std/charconv > index 2ce9c7d4cb9..5e44459749a 100644 > --- a/libstdc++-v3/include/std/charconv > +++ b/libstdc++-v3/include/std/charconv > @@ -407,176 +407,127 @@ namespace __detail >return true; > } > > - /// std::from_chars implementation for integers in base 2. > - template > + // Construct and return a lookup table that maps 0-9, A-Z and a-z to the > + // corresponding corresponding base-36 value and maps all other > characters > + // to 127. > + constexpr auto > + __from_chars_alnum_to_val_table() > + { > +constexpr unsigned char __lower_letters[] > + = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', > + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', > + 'u', 'v', 'w', 'x', 'y', 'z' }; > +constexpr unsigned char __upper_letters[] > + = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', > + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S',