Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

Dāvis Mosāns Thu, 21 Jul 2016 17:44:02 -0700

2016-07-21 20:46 GMT+03:00 Brad King <brad.k...@kitware.com>:
> On 07/21/2016 01:36 PM, Dāvis Mosāns wrote:
>> Anyway I improved this in places where it was easy, but in some places it's
>> more complicated...
>>
>> For example
>>
>>    while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) 
>> {
>>      // Put the output in the right place.
>>      if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
>>        if (output_variable.empty()) {
>>          cmSystemTools::Stdout(data, length);
>>
>> Here we output buffer immediately.
>>
>>  while ((out || err) &&
>>           (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) 
>> {
>>      if (out && p == cmsysProcess_Pipe_STDOUT) {
>>        if (!out->Process(data, length)) {
>
> In such cases the data need to be piped through a buffered decoder
> that can keep partial fragments around between updates.
>
> Does MultiByteToWideChar or some other API have a way to detect
> such boundaries?


As far as I know in WinAPI there isn't any such function.
With MultiByteToWideChar such partial char would be replaced with ? (U+003F)
or � (U+FFFD).

We would need to use some library or implement this ourselves.
In WinAPI there's CharPrevExA and IsDBCSLeadByteEx (or GetCPInfo) which we can
use and implement this easily for 1-2 byte code pages but it doesn't work for
code pages where character can be more than 2 bytes, eg. UTF-8. Those would
need to be handled separately.
Also could check if last character is ? and try again with one byte less.


Using EnumSystemCodePages and GetCPInfoEx I collected info about all supported
code pages on my Windows 10

https://paste.kde.org/pthwqdbxv/rjwgwd/raw

Code pages where MaxCharSize is more than 1 and UseLeadByte is No need special
handling for those depending on that particular encoding.

>> +      bool DecodeText(std::string raw, std::string& decoded)
>> +      {
>> +        bool success = true;
>> +        decoded = raw;
>> +#if defined(_WIN32)
>> +        if (raw.size() > 0 && codepage != defaultCodepage) {
>> +          success = false;
>> +          const int wlength = MultiByteToWideChar(codepage, 0, raw.c_str(), 
>> int(raw.size()), NULL, 0);
>
> Why do we need new calls to MultiByteToWideChar instead of
> having clients just directly use kwsysEncoding_mbstowcs?
>

Because from WaitForData we're getting data and length, and I assume that data
might not be null-terminated but kwsysEncoding_mbstowcs expects source to be
null-terminated and doesn't accept length.
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

Reply via email to