On 30/04/19 16:06 +0100, Jonathan Wakely wrote:
Fix several bugs in the encoding conversions for filesystem::path that
prevent conversion of Unicode characters outside the Basic Multilingual
Plane, and prevent returning basic_string specializations with
alternative allocator types.

The std::codecvt_utf8 class template is not suitable for UTF-16
conversions because it uses UCS-2 instead. For conversions between UTF-8
and UTF-16 either std::codecvt<C, char, mbstate> or
codecvt_utf8_utf16<C> must be used.

The __str_codecvt_in and __str_codecvt_out utilities do not
return false on a partial conversion (e.g. for invalid or incomplete
Unicode input). Add new helpers that treat partial conversions as
errors, and use them for all filesystem::path conversions.

        PR libstdc++/90281 Fix string conversions for filesystem::path
        * include/bits/fs_path.h (u8path) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]:
        Use codecvt_utf8_utf16 instead of codecvt_utf8. Use
        __str_codecvt_in_all to fail for partial conversions and throw on
        error.
        [!_GLIBCXX_FILESYSTEM_IS_WINDOWS && _GLIBCXX_USE_CHAR8_T]
        (path::_Cvt<char8_t>): Add explicit specialization.
        [_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Remove
        overloads.
        [_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
        if-constexpr instead of dispatching to _S_wconvert. Use codecvt
        instead of codecvt_utf8. Use __str_codecvt_in_all and
        __str_codecvt_out_all.
        [!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
        codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
        (path::_S_str_convert) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
        codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
        with allocator. Use __str_codecvt_out_all. Fallthrough to POSIX code
        after converting to UTF-8.
        (path::_S_str_convert): Use codecvt instead of codecvt_utf8. Use
        __str_codecvt_in_all.
        (path::string): Fix initialization of string types with different
        allocators.
        (path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
        codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
        * include/bits/locale_conv.h (__do_str_codecvt): Reorder static and
        runtime conditions.
        (__str_codecvt_out_all, __str_codecvt_in_all): New functions that
        return false for partial conversions.
        * include/experimental/bits/fs_path.h (u8path):
        [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Implement correctly for mingw.
        [_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_wconvert): Add
        missing handling for char8_t. Use codecvt and codecvt_utf8_utf16
        instead of codecvt_utf8. Use __str_codecvt_in_all and
        __str_codecvt_out_all.
        [!_GLIBCXX_FILESYSTEM_IS_WINDOWS] (path::_Cvt::_S_convert): Use
        codecvt instead of codecvt_utf8. Use __str_codecvt_out_all.
        (path::string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
        codecvt_utf8_utf16 instead of codecvt_utf8. Construct return values
        with allocator. Use __str_codecvt_out_all and __str_codecvt_in_all.
        (path::string) [!_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
        __str_codecvt_in_all.
        (path::u8string) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use
        codecvt_utf8_utf16 instead of codecvt_utf8. Use __str_codecvt_out_all.
        * src/c++17/fs_path.cc (path::_S_convert_loc): Use
        __str_codecvt_in_all.
        * src/filesystem/path.cc (path::_S_convert_loc): Likewise.
        * testsuite/27_io/filesystem/path/construct/90281.cc: New test.
        * testsuite/27_io/filesystem/path/factory/u8path.cc: New test.
        * testsuite/27_io/filesystem/path/native/string.cc: Test with empty
        strings and with Unicode characters outside the basic multilingual
        plane.
        * testsuite/27_io/filesystem/path/native/alloc.cc: New test.
        * testsuite/experimental/filesystem/path/construct/90281.cc: New test.
        * testsuite/experimental/filesystem/path/factory/u8path.cc: New test.
        * testsuite/experimental/filesystem/path/native/alloc.cc: New test.
        * testsuite/experimental/filesystem/path/native/string.cc: Test with
        empty strings and with Unicode characters outside the basic
        multilingual plane.

Tested powerpc64le-linux and x86_64-w64-mingw32.

As this ended up being a large patch I'll wait until after 9.1.0 is
released to commit this (and then will backport a simpler version).

I forgot to commit the patch after the release. It's now on trunk, as
r272385, and I'll backport it soon.



Reply via email to