Re: [PATCH 2/2] libstdc++: Handle encodings in localized chrono formatting [PR109162]
Hi, Jonathan Wakely writes: > I am undecided about pushing this PATCH 2/2 to trunk. PATCH 1/2 needs to > be done now due to the ABI impact on cris-elf. This one could wait for > stage 1. (HP, this one probably isn't of interest to you, but I don't > know how to tell git-send-email to only CC you on the first patch). (haven't read the patchset, just weighing in with Git tips) You can, instead of specifying git send-email --cc=..., just place Cc: in the patch file to Cc just that patch. G-S-E will aggregate them together before shipping out. Hope that helps, have a lovely day. > I'd like to do this change now so people can experiment with the new > feature by linking with -lstdc++exp, but the problem is that linking to > that library becomes a hard requirement for any uses of std::chrono > types with std::format in C++23. So code that compiles and links in > C++20 mode will fail to link in C++23 mode unless you use the extra > libstdc++exp.a library. The code might have worked fine in C++20 mode, > or it might have just linked OK but produce mojibake at runtime, it > depends on the locale. > > Declaring the new functions as weak and then checking them for non-null > at runtime doesn't really help, because the non-weak symbols in the > static lib won't be used without -Wl,--whole-archive -lstdc++exp because > it's OK for weak symbols to be unresolved. We could play games with > linker scripts so that libstdc++exp.a is a linker script which ensures > the non-weak defs get used, but I'm not convinced it's worth the hassle > for a feature that will probably get added to the main libstdc++.so in > stage 1 anyway. > > What do others think? Should I just push it and require -lstdc++exp for > chrono formatting in C++23 mode? > > -- >8 -- > > This implements the C++23 paper P2419R2 (Clarify handling of encodings > in localized formatting of chrono types). The requirement is that when > the literal encoding is "a Unicode encoding form" and the formatting > locale uses a different encoding, any locale-specific strings such as > "août" for std::chrono::August should be converted to the literal > encoding. > > Using the recently-added std::locale::encoding() function we can check > the locale's encoding and then use iconv if a conversion is needed. This > requires a new non-inline function in the library. As this is currently > experimental, the new symbol is added to libstdc++exp.a for now. This > means that formatting std::chrono types requires -lstdc++exp in C++23 > and later, at least until the new function is considered stable enough > to add to the main library. When that happens, we might want to consider > enabling these encoding conversions for C++20 too, treating it as a DR > that resolves LWG 3656. > > Because nl_langinfo_l and iconv_open both allocate memory, a naive > implementation would perform multiple allocations and deallocations for > every snippet of locale-specific text that needs to be converted to > UTF-8. To avoid that, a new internal locale::facet is defined to store > the text_encoding and an iconv_t descriptor, which are then cached in > the formatting locale. This requires access to the internals of a > std::locale object in src/c++26/text_encoding.cc, so that file now needs > to be compiled with -fno-access-control. When the new symbols move into > libstdc++.so we should be able to avoid that by adding new member > functions to std::locale. > > With this change we can increase the value of the __cpp_lib_format macro > for C++23. The value should be 202207 for P2419R2, but we already > implement P2510R3 (Formatting pointers) so can use the value 202304. > > libstdc++-v3/ChangeLog: > > PR libstdc++/109162 > * include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific): > Add new accessor functions to use a reserved bit in _Spec. > (__formatter_chrono::_M_parse): Use _M_locale_specific(true) > when chrono-specs contains locale-dependent conversion > specifiers. > (__formatter_chrono::_M_format): Open iconv descriptor if > conversion to UTF-8 will be needed. > (__formatter_chrono::_M_write): New function to write a > localized string with possible character conversion. > (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B) > (__formatter_chrono::_M_p, __formatter_chrono::_M_r) > (__formatter_chrono::_M_x, __formatter_chrono::_M_X) > (__formatter_chrono::_M_locale_fmt): Use _M_write. > * include/bits/version.def: Add C++23 values for format. > * include/bits/version.h: Regenerate. > * include/std/format (_GLIBCXX_P2518R3): Check feature test > macro instead of __cplusplus. > (basic_format_context): Declare __formatter_chrono as friend. > * src/c++26/Makefile.am: Use -fno-access-control for > text_encoding.o. > * src/c++26/Makefile.in: Regenerate. > * src/c++26/text_encoding.cc (__encoding): New locale facet. > (locale::encoding): Check for __en
[PATCH 2/2] libstdc++: Handle encodings in localized chrono formatting [PR109162]
I am undecided about pushing this PATCH 2/2 to trunk. PATCH 1/2 needs to be done now due to the ABI impact on cris-elf. This one could wait for stage 1. (HP, this one probably isn't of interest to you, but I don't know how to tell git-send-email to only CC you on the first patch). I'd like to do this change now so people can experiment with the new feature by linking with -lstdc++exp, but the problem is that linking to that library becomes a hard requirement for any uses of std::chrono types with std::format in C++23. So code that compiles and links in C++20 mode will fail to link in C++23 mode unless you use the extra libstdc++exp.a library. The code might have worked fine in C++20 mode, or it might have just linked OK but produce mojibake at runtime, it depends on the locale. Declaring the new functions as weak and then checking them for non-null at runtime doesn't really help, because the non-weak symbols in the static lib won't be used without -Wl,--whole-archive -lstdc++exp because it's OK for weak symbols to be unresolved. We could play games with linker scripts so that libstdc++exp.a is a linker script which ensures the non-weak defs get used, but I'm not convinced it's worth the hassle for a feature that will probably get added to the main libstdc++.so in stage 1 anyway. What do others think? Should I just push it and require -lstdc++exp for chrono formatting in C++23 mode? -- >8 -- This implements the C++23 paper P2419R2 (Clarify handling of encodings in localized formatting of chrono types). The requirement is that when the literal encoding is "a Unicode encoding form" and the formatting locale uses a different encoding, any locale-specific strings such as "août" for std::chrono::August should be converted to the literal encoding. Using the recently-added std::locale::encoding() function we can check the locale's encoding and then use iconv if a conversion is needed. This requires a new non-inline function in the library. As this is currently experimental, the new symbol is added to libstdc++exp.a for now. This means that formatting std::chrono types requires -lstdc++exp in C++23 and later, at least until the new function is considered stable enough to add to the main library. When that happens, we might want to consider enabling these encoding conversions for C++20 too, treating it as a DR that resolves LWG 3656. Because nl_langinfo_l and iconv_open both allocate memory, a naive implementation would perform multiple allocations and deallocations for every snippet of locale-specific text that needs to be converted to UTF-8. To avoid that, a new internal locale::facet is defined to store the text_encoding and an iconv_t descriptor, which are then cached in the formatting locale. This requires access to the internals of a std::locale object in src/c++26/text_encoding.cc, so that file now needs to be compiled with -fno-access-control. When the new symbols move into libstdc++.so we should be able to avoid that by adding new member functions to std::locale. With this change we can increase the value of the __cpp_lib_format macro for C++23. The value should be 202207 for P2419R2, but we already implement P2510R3 (Formatting pointers) so can use the value 202304. libstdc++-v3/ChangeLog: PR libstdc++/109162 * include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific): Add new accessor functions to use a reserved bit in _Spec. (__formatter_chrono::_M_parse): Use _M_locale_specific(true) when chrono-specs contains locale-dependent conversion specifiers. (__formatter_chrono::_M_format): Open iconv descriptor if conversion to UTF-8 will be needed. (__formatter_chrono::_M_write): New function to write a localized string with possible character conversion. (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B) (__formatter_chrono::_M_p, __formatter_chrono::_M_r) (__formatter_chrono::_M_x, __formatter_chrono::_M_X) (__formatter_chrono::_M_locale_fmt): Use _M_write. * include/bits/version.def: Add C++23 values for format. * include/bits/version.h: Regenerate. * include/std/format (_GLIBCXX_P2518R3): Check feature test macro instead of __cplusplus. (basic_format_context): Declare __formatter_chrono as friend. * src/c++26/Makefile.am: Use -fno-access-control for text_encoding.o. * src/c++26/Makefile.in: Regenerate. * src/c++26/text_encoding.cc (__encoding): New locale facet. (locale::encoding): Check for __encoding facet first. Add fast path for "C" locale. (__with_encoding_conversion): New function to add __encoding facet to a locale. (__locale_encoding_to_utf8): New function to convert a string from a locale's encoding to UTF-8. * testsuite/20_util/duration/io.cc: Add -lstdc++exp for C++23. * testsuite/std/time/clock/file/io.cc: Likewise. * testsuit