https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70893

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The little_endian flag only affects the "external representation" not the
in-memory representation. Because the external representation for
codecvt_utf8_utf16 is UTF-8 it ignores the little_endian flag.

If you want to convert UTF-16BE to UTF-8 I think you need to treat UTF-16BE as
the external representation, so you need to use std::codecvt_utf16 to convert
it to some internal representation, and then convert that to UTF-8. So the
right way to write that function would be:

template<bool big_e> inline std::string std_utf16_to_utf8(const char *s, size_t
sz)
{
  using namespace std; sz &= ~1;
  // Convert UTF-16BE to UCS4 (with native endianness):
  wstring_convert<codecvt_utf16<char32_t>, char32_t> conv1("");
  std::u32string u32str = conv1.from_bytes(s, s+sz);
  // Convert UCS4 (with native endianness) to UTF-8:
  wstring_convert<codecvt_utf8<char32_t>, char32_t> conv2("");
  return conv2.to_bytes(u32str);
}

Reply via email to