Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
2016-08-02 20:11 GMT+03:00 Brad King : > > How are we to know the encoding being produced by the child? > There isn't any reliable way to detect it, we just have to know what particular application uses. Also there aren't any standard or API to determine it but generally most applications use console's code page. Of course not all, it could be OEM, ANSI or something else. If application uses console's code page for pipes then app.exe | echo will output correctly in cmd and piping in other application which also uses it will work. Here our issue is that when CMake checks for compiler path it gets output from MSVC and it will give this path in console's codepage which will be incorrectly interpreted as UTF-8 so we need to decode it first. If there will some other application which uses different encoding then it will need separate handling for that. -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers
Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
On 07/21/2016 08:43 PM, Dāvis Mosāns wrote: > With MultiByteToWideChar such partial char would be replaced with ? (U+003F) > or � (U+FFFD). [snip] > Also could check if last character is ? and try again with one byte less. This may be a good middle ground. The excess bytes would then be buffered for inclusion at the beginning of the next block conversion. How are we to know the encoding being produced by the child? > from WaitForData we're getting data and length, and I assume that data > might not be null-terminated but kwsysEncoding_mbstowcs expects source to be > null-terminated and doesn't accept length. Okay, thanks. Using MultiByteToWideChar for Windows-specific code is fine. If we ever need to offer a similar generalization then we can provide a kwsysEncoding_mbsnrtowcs later. BTW, with all your changes to KWSys it may be easier to iterate if you contribute to KWSys directly. Please see instructions here: http://public.kitware.com/Wiki/KWSys/Git/Develop Thanks, -Brad -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers
Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
2016-07-21 20:46 GMT+03:00 Brad King : > On 07/21/2016 01:36 PM, Dāvis Mosāns wrote: >> Anyway I improved this in places where it was easy, but in some places it's >> more complicated... >> >> For example >> >>while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> // Put the output in the right place. >> if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) { >>if (output_variable.empty()) { >> cmSystemTools::Stdout(data, length); >> >> Here we output buffer immediately. >> >> while ((out || err) && >> (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> if (out && p == cmsysProcess_Pipe_STDOUT) { >>if (!out->Process(data, length)) { > > In such cases the data need to be piped through a buffered decoder > that can keep partial fragments around between updates. > > Does MultiByteToWideChar or some other API have a way to detect > such boundaries? As far as I know in WinAPI there isn't any such function. With MultiByteToWideChar such partial char would be replaced with ? (U+003F) or � (U+FFFD). We would need to use some library or implement this ourselves. In WinAPI there's CharPrevExA and IsDBCSLeadByteEx (or GetCPInfo) which we can use and implement this easily for 1-2 byte code pages but it doesn't work for code pages where character can be more than 2 bytes, eg. UTF-8. Those would need to be handled separately. Also could check if last character is ? and try again with one byte less. Using EnumSystemCodePages and GetCPInfoEx I collected info about all supported code pages on my Windows 10 https://paste.kde.org/pthwqdbxv/rjwgwd/raw Code pages where MaxCharSize is more than 1 and UseLeadByte is No need special handling for those depending on that particular encoding. >> + bool DecodeText(std::string raw, std::string& decoded) >> + { >> +bool success = true; >> +decoded = raw; >> +#if defined(_WIN32) >> +if (raw.size() > 0 && codepage != defaultCodepage) { >> + success = false; >> + const int wlength = MultiByteToWideChar(codepage, 0, raw.c_str(), >> int(raw.size()), NULL, 0); > > Why do we need new calls to MultiByteToWideChar instead of > having clients just directly use kwsysEncoding_mbstowcs? > Because from WaitForData we're getting data and length, and I assume that data might not be null-terminated but kwsysEncoding_mbstowcs expects source to be null-terminated and doesn't accept length. -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers
Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
On 07/21/2016 01:36 PM, Dāvis Mosāns wrote: > Anyway I improved this in places where it was easy, but in some places it's > more complicated... > > For example > >while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { > // Put the output in the right place. > if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) { >if (output_variable.empty()) { > cmSystemTools::Stdout(data, length); > > Here we output buffer immediately. > > while ((out || err) && > (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { > if (out && p == cmsysProcess_Pipe_STDOUT) { >if (!out->Process(data, length)) { In such cases the data need to be piped through a buffered decoder that can keep partial fragments around between updates. Does MultiByteToWideChar or some other API have a way to detect such boundaries? -Brad -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers
Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
2016-07-18 17:04 GMT+03:00 Brad King : > On 07/07/2016 05:54 PM, Dāvis Mosāns wrote: >> Typically Windows applications (eg. MSVC compiler) use current console's >> codepage for output to pipes so we need to encode that to internally used >> encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE). > [snip] >>while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> +cmsysProcess_DecodeTextOutput(cp, &data, &length); > [snip] >>while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) >> { >> +cmsysProcess_DecodeTextOutput(cp, &data, &length); > > Unfortunately I don't think that pattern will work reliably because > a multi-byte character could be split across a buffering boundary > and therefore not decoded correctly. We may need the consuming > contexts to collect the whole output before converting. > That's true, but only for MBCS code pages, it will work fine for SBCS and DBCS (if buffer size is even). I think this issue would appear very rarely. Anyway I improved this in places where it was easy, but in some places it's more complicated... For example while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { // Put the output in the right place. if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) { if (output_variable.empty()) { cmSystemTools::Stdout(data, length); Here we output buffer immediately. while ((out || err) && (p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { if (out && p == cmsysProcess_Pipe_STDOUT) { if (!out->Process(data, length)) { and also here we process buffer immediately. -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers
Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding
On 07/07/2016 05:54 PM, Dāvis Mosāns wrote: > Typically Windows applications (eg. MSVC compiler) use current console's > codepage for output to pipes so we need to encode that to internally used > encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE). [snip] >while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { > +cmsysProcess_DecodeTextOutput(cp, &data, &length); [snip] >while ((p = cmsysProcess_WaitForData(cp, &data, &length, CM_NULLPTR), p)) { > +cmsysProcess_DecodeTextOutput(cp, &data, &length); Unfortunately I don't think that pattern will work reliably because a multi-byte character could be split across a buffering boundary and therefore not decoded correctly. We may need the consuming contexts to collect the whole output before converting. Also, I'd like to avoid modifying KWSys Process for this if possible because it may be replaced by libuv so we need the conversion code to be in CMake proper. Thanks, -Brad -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers