2016-07-21 20:46 GMT+03:00 Brad King <brad.k...@kitware.com>: > On 07/21/2016 01:36 PM, Dāvis Mosāns wrote: >> Anyway I improved this in places where it was easy, but in some places it's >> more complicated... >> >> For example >> >> while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> // Put the output in the right place. >> if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) { >> if (output_variable.empty()) { >> cmSystemTools::Stdout(data, length); >> >> Here we output buffer immediately. >> >> while ((out || err) && >> (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> if (out && p == cmsysProcess_Pipe_STDOUT) { >> if (!out->Process(data, length)) { > > In such cases the data need to be piped through a buffered decoder > that can keep partial fragments around between updates. > > Does MultiByteToWideChar or some other API have a way to detect > such boundaries?
As far as I know in WinAPI there isn't any such function. With MultiByteToWideChar such partial char would be replaced with ? (U+003F) or � (U+FFFD). We would need to use some library or implement this ourselves. In WinAPI there's CharPrevExA and IsDBCSLeadByteEx (or GetCPInfo) which we can use and implement this easily for 1-2 byte code pages but it doesn't work for code pages where character can be more than 2 bytes, eg. UTF-8. Those would need to be handled separately. Also could check if last character is ? and try again with one byte less. Using EnumSystemCodePages and GetCPInfoEx I collected info about all supported code pages on my Windows 10 https://paste.kde.org/pthwqdbxv/rjwgwd/raw Code pages where MaxCharSize is more than 1 and UseLeadByte is No need special handling for those depending on that particular encoding. >> + bool DecodeText(std::string raw, std::string& decoded) >> + { >> + bool success = true; >> + decoded = raw; >> +#if defined(_WIN32) >> + if (raw.size() > 0 && codepage != defaultCodepage) { >> + success = false; >> + const int wlength = MultiByteToWideChar(codepage, 0, raw.c_str(), >> int(raw.size()), NULL, 0); > > Why do we need new calls to MultiByteToWideChar instead of > having clients just directly use kwsysEncoding_mbstowcs? > Because from WaitForData we're getting data and length, and I assume that data might not be null-terminated but kwsysEncoding_mbstowcs expects source to be null-terminated and doesn't accept length. -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers