Re: [PATCH 2/2] libstdc++: Handle encodings in localized chrono formatting [PR109162]

2024-02-01 Thread Arsen Arsenović
Hi,

Jonathan Wakely  writes:

> I am undecided about pushing this PATCH 2/2 to trunk. PATCH 1/2 needs to
> be done now due to the ABI impact on cris-elf. This one could wait for
> stage 1. (HP, this one probably isn't of interest to you, but I don't
> know how to tell git-send-email to only CC you on the first patch).

(haven't read the patchset, just weighing in with Git tips)

You can, instead of specifying git send-email --cc=..., just place Cc:
 in the patch file to Cc just that patch.  G-S-E will aggregate
them together before shipping out.

Hope that helps, have a lovely day.

> I'd like to do this change now so people can experiment with the new
> feature by linking with -lstdc++exp, but the problem is that linking to
> that library becomes a hard requirement for any uses of std::chrono
> types with std::format in C++23. So code that compiles and links in
> C++20 mode will fail to link in C++23 mode unless you use the extra
> libstdc++exp.a library. The code might have worked fine in C++20 mode,
> or it might have just linked OK but produce mojibake at runtime, it
> depends on the locale.
>
> Declaring the new functions as weak and then checking them for non-null
> at runtime doesn't really help, because the non-weak symbols in the
> static lib won't be used without -Wl,--whole-archive -lstdc++exp because
> it's OK for weak symbols to be unresolved. We could play games with
> linker scripts so that libstdc++exp.a is a linker script which ensures
> the non-weak defs get used, but I'm not convinced it's worth the hassle
> for a feature that will probably get added to the main libstdc++.so in
> stage 1 anyway.
>
> What do others think? Should I just push it and require -lstdc++exp for
> chrono formatting in C++23 mode?
>
> -- >8 --
>
> This implements the C++23 paper P2419R2 (Clarify handling of encodings
> in localized formatting of chrono types). The requirement is that when
> the literal encoding is "a Unicode encoding form" and the formatting
> locale uses a different encoding, any locale-specific strings such as
> "août" for std::chrono::August should be converted to the literal
> encoding.
>
> Using the recently-added std::locale::encoding() function we can check
> the locale's encoding and then use iconv if a conversion is needed. This
> requires a new non-inline function in the library. As this is currently
> experimental, the new symbol is added to libstdc++exp.a for now. This
> means that formatting std::chrono types requires -lstdc++exp in C++23
> and later, at least until the new function is considered stable enough
> to add to the main library. When that happens, we might want to consider
> enabling these encoding conversions for C++20 too, treating it as a DR
> that resolves LWG 3656.
>
> Because nl_langinfo_l and iconv_open both allocate memory, a naive
> implementation would perform multiple allocations and deallocations for
> every snippet of locale-specific text that needs to be converted to
> UTF-8. To avoid that, a new internal locale::facet is defined to store
> the text_encoding and an iconv_t descriptor, which are then cached in
> the formatting locale. This requires access to the internals of a
> std::locale object in src/c++26/text_encoding.cc, so that file now needs
> to be compiled with -fno-access-control. When the new symbols move into
> libstdc++.so we should be able to avoid that by adding new member
> functions to std::locale.
>
> With this change we can increase the value of the __cpp_lib_format macro
> for C++23. The value should be 202207 for P2419R2, but we already
> implement P2510R3 (Formatting pointers) so can use the value 202304.
>
> libstdc++-v3/ChangeLog:
>
>   PR libstdc++/109162
>   * include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific):
>   Add new accessor functions to use a reserved bit in _Spec.
>   (__formatter_chrono::_M_parse): Use _M_locale_specific(true)
>   when chrono-specs contains locale-dependent conversion
>   specifiers.
>   (__formatter_chrono::_M_format): Open iconv descriptor if
>   conversion to UTF-8 will be needed.
>   (__formatter_chrono::_M_write): New function to write a
>   localized string with possible character conversion.
>   (__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
>   (__formatter_chrono::_M_p, __formatter_chrono::_M_r)
>   (__formatter_chrono::_M_x, __formatter_chrono::_M_X)
>   (__formatter_chrono::_M_locale_fmt): Use _M_write.
>   * include/bits/version.def: Add C++23 values for format.
>   * include/bits/version.h: Regenerate.
>   * include/std/format (_GLIBCXX_P2518R3): Check feature test
>   macro instead of __cplusplus.
>   (basic_format_context): Declare __formatter_chrono as friend.
>   * src/c++26/Makefile.am: Use -fno-access-control for
>   text_encoding.o.
>   * src/c++26/Makefile.in: Regenerate.
>   * src/c++26/text_encoding.cc (__encoding): New locale facet.
>   (locale::encoding): Check for __en

[PATCH 2/2] libstdc++: Handle encodings in localized chrono formatting [PR109162]

2024-02-01 Thread Jonathan Wakely
I am undecided about pushing this PATCH 2/2 to trunk. PATCH 1/2 needs to
be done now due to the ABI impact on cris-elf. This one could wait for
stage 1. (HP, this one probably isn't of interest to you, but I don't
know how to tell git-send-email to only CC you on the first patch).

I'd like to do this change now so people can experiment with the new
feature by linking with -lstdc++exp, but the problem is that linking to
that library becomes a hard requirement for any uses of std::chrono
types with std::format in C++23. So code that compiles and links in
C++20 mode will fail to link in C++23 mode unless you use the extra
libstdc++exp.a library. The code might have worked fine in C++20 mode,
or it might have just linked OK but produce mojibake at runtime, it
depends on the locale.

Declaring the new functions as weak and then checking them for non-null
at runtime doesn't really help, because the non-weak symbols in the
static lib won't be used without -Wl,--whole-archive -lstdc++exp because
it's OK for weak symbols to be unresolved. We could play games with
linker scripts so that libstdc++exp.a is a linker script which ensures
the non-weak defs get used, but I'm not convinced it's worth the hassle
for a feature that will probably get added to the main libstdc++.so in
stage 1 anyway.

What do others think? Should I just push it and require -lstdc++exp for
chrono formatting in C++23 mode?

-- >8 --

This implements the C++23 paper P2419R2 (Clarify handling of encodings
in localized formatting of chrono types). The requirement is that when
the literal encoding is "a Unicode encoding form" and the formatting
locale uses a different encoding, any locale-specific strings such as
"août" for std::chrono::August should be converted to the literal
encoding.

Using the recently-added std::locale::encoding() function we can check
the locale's encoding and then use iconv if a conversion is needed. This
requires a new non-inline function in the library. As this is currently
experimental, the new symbol is added to libstdc++exp.a for now. This
means that formatting std::chrono types requires -lstdc++exp in C++23
and later, at least until the new function is considered stable enough
to add to the main library. When that happens, we might want to consider
enabling these encoding conversions for C++20 too, treating it as a DR
that resolves LWG 3656.

Because nl_langinfo_l and iconv_open both allocate memory, a naive
implementation would perform multiple allocations and deallocations for
every snippet of locale-specific text that needs to be converted to
UTF-8. To avoid that, a new internal locale::facet is defined to store
the text_encoding and an iconv_t descriptor, which are then cached in
the formatting locale. This requires access to the internals of a
std::locale object in src/c++26/text_encoding.cc, so that file now needs
to be compiled with -fno-access-control. When the new symbols move into
libstdc++.so we should be able to avoid that by adding new member
functions to std::locale.

With this change we can increase the value of the __cpp_lib_format macro
for C++23. The value should be 202207 for P2419R2, but we already
implement P2510R3 (Formatting pointers) so can use the value 202304.

libstdc++-v3/ChangeLog:

PR libstdc++/109162
* include/bits/chrono_io.h (_ChronoSpec::_M_locale_specific):
Add new accessor functions to use a reserved bit in _Spec.
(__formatter_chrono::_M_parse): Use _M_locale_specific(true)
when chrono-specs contains locale-dependent conversion
specifiers.
(__formatter_chrono::_M_format): Open iconv descriptor if
conversion to UTF-8 will be needed.
(__formatter_chrono::_M_write): New function to write a
localized string with possible character conversion.
(__formatter_chrono::_M_a_A, __formatter_chrono::_M_b_B)
(__formatter_chrono::_M_p, __formatter_chrono::_M_r)
(__formatter_chrono::_M_x, __formatter_chrono::_M_X)
(__formatter_chrono::_M_locale_fmt): Use _M_write.
* include/bits/version.def: Add C++23 values for format.
* include/bits/version.h: Regenerate.
* include/std/format (_GLIBCXX_P2518R3): Check feature test
macro instead of __cplusplus.
(basic_format_context): Declare __formatter_chrono as friend.
* src/c++26/Makefile.am: Use -fno-access-control for
text_encoding.o.
* src/c++26/Makefile.in: Regenerate.
* src/c++26/text_encoding.cc (__encoding): New locale facet.
(locale::encoding): Check for __encoding facet first. Add fast
path for "C" locale.
(__with_encoding_conversion): New function to add __encoding
facet to a locale.
(__locale_encoding_to_utf8): New function to convert a string
from a locale's encoding to UTF-8.
* testsuite/20_util/duration/io.cc: Add -lstdc++exp for C++23.
* testsuite/std/time/clock/file/io.cc: Likewise.
* testsuit