https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71107

            Bug ID: 71107
           Summary: wstring_convert::from_bytes produces wide chars with
                    the wrong byte order
           Product: gcc
           Version: 6.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: cantabile.desu at gmail dot com
  Target Milestone: ---

This small program illustrates the problem:

#include <locale>
#include <codecvt>
#include <cstdio>
#include <string>

int wmain(int argc, wchar_t **argv) {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> utf16;

    printf("Input bytes:\n");
    for (size_t i = 0; i < wcslen(argv[0]) * sizeof(wchar_t); i++)
        printf("%x ", (int)((uint8_t *)argv[0])[i]);
    printf("\n");

    std::string bytes = utf16.to_bytes(argv[0]);

    printf("Text after to_bytes: '%s'\n", bytes.c_str());

    printf("Bytes after to_bytes:\n");
    for (size_t i = 0; i < bytes.size(); i++)
        printf("%x ", (int)((const uint8_t *)bytes.c_str())[i]);
    printf("\n");

    std::wstring wide = utf16.from_bytes(bytes);

    printf("Bytes after from_bytes:\n");
    for (size_t i = 0; i < wide.size() * sizeof(wchar_t); i++)
        printf("%x ", (int)((const uint8_t *)wide.c_str())[i]);
    printf("\n");

    bytes = utf16.to_bytes(wide);

    printf("Text after to_bytes: '%s'\n", bytes.c_str());

    printf("Bytes after to_bytes:\n");
    for (size_t i = 0; i < bytes.size(); i++)
        printf("%x ", (int)((const uint8_t *)bytes.c_str())[i]);
    printf("\n");

    return 0;
}


Command:
i686-w64-mingw32-g++ -std=c++11 -municode -o test.exe test.cpp -static-libgcc
-static-libstdc++

Output when compiled by GCC 6.1.1:
Input bytes:
5a 0 3a 0 5c 0 74 0 6d 0 70 0 5c 0 74 0 65 0 73 0 74 0 2e 0 65 0 78 0 65 0 
Text after to_bytes: 'Z:\tmp\test.exe'
Bytes after to_bytes:
5a 3a 5c 74 6d 70 5c 74 65 73 74 2e 65 78 65 
Bytes after from_bytes:
0 5a 0 3a 0 5c 0 74 0 6d 0 70 0 5c 0 74 0 65 0 73 0 74 0 2e 0 65 0 78 0 65 
Text after to_bytes: '娀㨀尀琀洀瀀尀琀攀猀琀⸀攀砀攀'
Bytes after to_bytes:
e5 a8 80 e3 a8 80 e5 b0 80 e7 90 80 e6 b4 80 e7 80 80 e5 b0 80 e7 90 80 e6 94
80 e7 8c 80 e7 90 80 e2 b8 80 e6 94 80 e7 a0 80 e6 94 80 


Output when compiled by GCC 5.1.0:
Input bytes:
5a 0 3a 0 5c 0 74 0 6d 0 70 0 5c 0 74 0 65 0 73 0 74 0 2e 0 65 0 78 0 65 0 
Text after to_bytes: 'Z:\tmp\test.exe'
Bytes after to_bytes:
5a 3a 5c 74 6d 70 5c 74 65 73 74 2e 65 78 65 
Bytes after from_bytes:
5a 0 3a 0 5c 0 74 0 6d 0 70 0 5c 0 74 0 65 0 73 0 74 0 2e 0 65 0 78 0 65 0 
Text after to_bytes: 'Z:\tmp\test.exe'
Bytes after to_bytes:
5a 3a 5c 74 6d 70 5c 74 65 73 74 2e 65 78 65 

GCC 5.3.0 is affected too.

Output of `i686-w64-mingw32-g++ -v`:
Using built-in specs.
COLLECT_GCC=i686-w64-mingw32-g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i686-w64-mingw32/6.1.1/lto-wrapper
Target: i686-w64-mingw32
Configured with: /build/mingw-w64-gcc/src/gcc-6-20160505/configure
--prefix=/usr --libexecdir=/usr/lib --target=i686-w64-mingw32
--enable-languages=c,lto,c++,objc,obj-c++,fortran,ada --enable-shared
--enable-static --enable-threads=posix --enable-fully-dynamic-string
--enable-libstdcxx-time=yes --with-system-zlib --enable-cloog-backend=isl
--enable-lto --disable-dw2-exceptions --enable-libgomp --disable-multilib
--enable-checking=release                                       Thread model:
posix
gcc version 6.1.1 20160505 (GCC) 

The system is a 64 bit Arch Linux. This GCC was obtained from the
"mingw-w64-gcc" package from Arch Linux.

Reply via email to