Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-08-02 Thread Dāvis Mosāns
2016-08-02 20:11 GMT+03:00 Brad King :
>
> How are we to know the encoding being produced by the child?
>

There isn't any reliable way to detect it, we just have to know what particular
application uses. Also there aren't any standard or API to determine it but
generally most applications use console's code page. Of course not all, it
could be OEM, ANSI or something else.

If application uses console's code page for pipes then app.exe | echo will
output correctly in cmd and piping in other application which also uses
it will work.

Here our issue is that when CMake checks for compiler path it gets output
from MSVC and it will give this path in console's codepage which will be
incorrectly interpreted as UTF-8 so we need to decode it first.

If there will some other application which uses different encoding then it will
need separate handling for that.
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers


Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-08-02 Thread Brad King
On 07/21/2016 08:43 PM, Dāvis Mosāns wrote:
> With MultiByteToWideChar such partial char would be replaced with ? (U+003F)
> or � (U+FFFD).
[snip]
> Also could check if last character is ? and try again with one byte less.

This may be a good middle ground.  The excess bytes would then be buffered
for inclusion at the beginning of the next block conversion.

How are we to know the encoding being produced by the child?

> from WaitForData we're getting data and length, and I assume that data
> might not be null-terminated but kwsysEncoding_mbstowcs expects source to be
> null-terminated and doesn't accept length.

Okay, thanks.  Using MultiByteToWideChar for Windows-specific code is
fine.  If we ever need to offer a similar generalization then we can
provide a kwsysEncoding_mbsnrtowcs later.

BTW, with all your changes to KWSys it may be easier to iterate
if you contribute to KWSys directly.  Please see instructions here:

 http://public.kitware.com/Wiki/KWSys/Git/Develop

Thanks,
-Brad
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-07-21 Thread Dāvis Mosāns
2016-07-21 20:46 GMT+03:00 Brad King :
> On 07/21/2016 01:36 PM, Dāvis Mosāns wrote:
>> Anyway I improved this in places where it was easy, but in some places it's
>> more complicated...
>>
>> For example
>>
>>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) 
>> {
>>  // Put the output in the right place.
>>  if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
>>if (output_variable.empty()) {
>>  cmSystemTools::Stdout(data, length);
>>
>> Here we output buffer immediately.
>>
>>  while ((out || err) &&
>>   (p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) 
>> {
>>  if (out && p == cmsysProcess_Pipe_STDOUT) {
>>if (!out->Process(data, length)) {
>
> In such cases the data need to be piped through a buffered decoder
> that can keep partial fragments around between updates.
>
> Does MultiByteToWideChar or some other API have a way to detect
> such boundaries?

As far as I know in WinAPI there isn't any such function.
With MultiByteToWideChar such partial char would be replaced with ? (U+003F)
or � (U+FFFD).

We would need to use some library or implement this ourselves.
In WinAPI there's CharPrevExA and IsDBCSLeadByteEx (or GetCPInfo) which we can
use and implement this easily for 1-2 byte code pages but it doesn't work for
code pages where character can be more than 2 bytes, eg. UTF-8. Those would
need to be handled separately.
Also could check if last character is ? and try again with one byte less.


Using EnumSystemCodePages and GetCPInfoEx I collected info about all supported
code pages on my Windows 10

https://paste.kde.org/pthwqdbxv/rjwgwd/raw

Code pages where MaxCharSize is more than 1 and UseLeadByte is No need special
handling for those depending on that particular encoding.

>> +  bool DecodeText(std::string raw, std::string& decoded)
>> +  {
>> +bool success = true;
>> +decoded = raw;
>> +#if defined(_WIN32)
>> +if (raw.size() > 0 && codepage != defaultCodepage) {
>> +  success = false;
>> +  const int wlength = MultiByteToWideChar(codepage, 0, raw.c_str(), 
>> int(raw.size()), NULL, 0);
>
> Why do we need new calls to MultiByteToWideChar instead of
> having clients just directly use kwsysEncoding_mbstowcs?
>

Because from WaitForData we're getting data and length, and I assume that data
might not be null-terminated but kwsysEncoding_mbstowcs expects source to be
null-terminated and doesn't accept length.
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-07-21 Thread Brad King
On 07/21/2016 01:36 PM, Dāvis Mosāns wrote:
> Anyway I improved this in places where it was easy, but in some places it's
> more complicated...
> 
> For example
> 
>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
>  // Put the output in the right place.
>  if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
>if (output_variable.empty()) {
>  cmSystemTools::Stdout(data, length);
> 
> Here we output buffer immediately.
> 
>  while ((out || err) &&
>   (p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
>  if (out && p == cmsysProcess_Pipe_STDOUT) {
>if (!out->Process(data, length)) {

In such cases the data need to be piped through a buffered decoder
that can keep partial fragments around between updates.

Does MultiByteToWideChar or some other API have a way to detect
such boundaries?

-Brad

-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-07-21 Thread Dāvis Mosāns
2016-07-18 17:04 GMT+03:00 Brad King :
> On 07/07/2016 05:54 PM, Dāvis Mosāns wrote:
>> Typically Windows applications (eg. MSVC compiler) use current console's
>> codepage for output to pipes so we need to encode that to internally used
>> encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE).
> [snip]
>>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) 
>> {
>> +cmsysProcess_DecodeTextOutput(cp, , );
> [snip]
>>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) 
>> {
>> +cmsysProcess_DecodeTextOutput(cp, , );
>
> Unfortunately I don't think that pattern will work reliably because
> a multi-byte character could be split across a buffering boundary
> and therefore not decoded correctly.  We may need the consuming
> contexts to collect the whole output before converting.
>

That's true, but only for MBCS code pages, it will work fine for SBCS and DBCS
(if buffer size is even). I think this issue would appear very rarely.
Anyway I improved this in places where it was easy, but in some places it's
more complicated...

For example

   while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
 // Put the output in the right place.
 if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
   if (output_variable.empty()) {
 cmSystemTools::Stdout(data, length);

Here we output buffer immediately.

 while ((out || err) &&
  (p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
 if (out && p == cmsysProcess_Pipe_STDOUT) {
   if (!out->Process(data, length)) {

and also here we process buffer immediately.
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

Re: [cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-07-18 Thread Brad King
On 07/07/2016 05:54 PM, Dāvis Mosāns wrote:
> Typically Windows applications (eg. MSVC compiler) use current console's
> codepage for output to pipes so we need to encode that to internally used
> encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE).
[snip]
>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
> +cmsysProcess_DecodeTextOutput(cp, , );
[snip]
>while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
> +cmsysProcess_DecodeTextOutput(cp, , );

Unfortunately I don't think that pattern will work reliably because
a multi-byte character could be split across a buffering boundary
and therefore not decoded correctly.  We may need the consuming
contexts to collect the whole output before converting.

Also, I'd like to avoid modifying KWSys Process for this if possible
because it may be replaced by libuv so we need the conversion code
to be in CMake proper.

Thanks,
-Brad

-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

[cmake-developers] [PATCH v4 4/4] For Windows encode process output to internally used encoding

2016-07-07 Thread Dāvis Mosāns
Typically Windows applications (eg. MSVC compiler) use current console's
codepage for output to pipes so we need to encode that to internally used
encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE).
---
 Source/cmExecProgramCommand.cxx|  1 +
 Source/cmExecuteProcessCommand.cxx |  1 +
 Source/cmProcessTools.cxx  |  2 ++
 Source/cmSystemTools.cxx   |  3 +++
 Source/kwsys/CMakeLists.txt|  2 ++
 Source/kwsys/Process.h.in  | 13 +
 Source/kwsys/ProcessUNIX.c |  6 ++
 Source/kwsys/ProcessWin32.c| 33 -
 Source/kwsys/SystemInformation.cxx |  1 +
 9 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Source/cmExecProgramCommand.cxx b/Source/cmExecProgramCommand.cxx
index 58bbc31..92b89fc 100644
--- a/Source/cmExecProgramCommand.cxx
+++ b/Source/cmExecProgramCommand.cxx
@@ -220,6 +220,7 @@ bool cmExecProgramCommand::RunCommand(const char* command, 
std::string& output,
   char* data;
   int p;
   while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
+cmsysProcess_DecodeTextOutput(cp, , );
 if (p == cmsysProcess_Pipe_STDOUT || p == cmsysProcess_Pipe_STDERR) {
   if (verbose) {
 cmSystemTools::Stdout(data, length);
diff --git a/Source/cmExecuteProcessCommand.cxx 
b/Source/cmExecuteProcessCommand.cxx
index d97b25f..b13fb2e 100644
--- a/Source/cmExecuteProcessCommand.cxx
+++ b/Source/cmExecuteProcessCommand.cxx
@@ -229,6 +229,7 @@ bool 
cmExecuteProcessCommand::InitialPass(std::vector const& args,
   char* data;
   int p;
   while ((p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
+cmsysProcess_DecodeTextOutput(cp, , );
 // Put the output in the right place.
 if (p == cmsysProcess_Pipe_STDOUT && !output_quiet) {
   if (output_variable.empty()) {
diff --git a/Source/cmProcessTools.cxx b/Source/cmProcessTools.cxx
index 34b8df2..8ab8070 100644
--- a/Source/cmProcessTools.cxx
+++ b/Source/cmProcessTools.cxx
@@ -23,10 +23,12 @@ void cmProcessTools::RunProcess(struct cmsysProcess_s* cp, 
OutputParser* out,
   while ((out || err) &&
  (p = cmsysProcess_WaitForData(cp, , , CM_NULLPTR), p)) {
 if (out && p == cmsysProcess_Pipe_STDOUT) {
+  cmsysProcess_DecodeTextOutput(cp, , );
   if (!out->Process(data, length)) {
 out = CM_NULLPTR;
   }
 } else if (err && p == cmsysProcess_Pipe_STDERR) {
+  cmsysProcess_DecodeTextOutput(cp, , );
   if (!err->Process(data, length)) {
 err = CM_NULLPTR;
   }
diff --git a/Source/cmSystemTools.cxx b/Source/cmSystemTools.cxx
index 9740ef7..aeb5471 100644
--- a/Source/cmSystemTools.cxx
+++ b/Source/cmSystemTools.cxx
@@ -616,6 +616,7 @@ bool 
cmSystemTools::RunSingleCommand(std::vector const& command,
   (captureStdOut || captureStdErr || outputflag != OUTPUT_NONE)) {
 while ((pipe = cmsysProcess_WaitForData(cp, , , CM_NULLPTR)) >
0) {
+  cmsysProcess_DecodeTextOutput(cp, , );
   // Translate NULL characters in the output into valid text.
   // Visual Studio 7 puts these characters in the output of its
   // build process.
@@ -1689,11 +1690,13 @@ int cmSystemTools::WaitForLine(cmsysProcess* process, 
std::string& line,
   return pipe;
 } else if (pipe == cmsysProcess_Pipe_STDOUT) {
   // Append to the stdout buffer.
+  cmsysProcess_DecodeTextOutput(process, , );
   std::vector::size_type size = out.size();
   out.insert(out.end(), data, data + length);
   outiter = out.begin() + size;
 } else if (pipe == cmsysProcess_Pipe_STDERR) {
   // Append to the stderr buffer.
+  cmsysProcess_DecodeTextOutput(process, , );
   std::vector::size_type size = err.size();
   err.insert(err.end(), data, data + length);
   erriter = err.begin() + size;
diff --git a/Source/kwsys/CMakeLists.txt b/Source/kwsys/CMakeLists.txt
index 39b03b3..65203c0 100644
--- a/Source/kwsys/CMakeLists.txt
+++ b/Source/kwsys/CMakeLists.txt
@@ -708,6 +708,8 @@ IF(KWSYS_USE_Process)
   IF(NOT UNIX)
 # Use the Windows implementation.
 SET(KWSYS_C_SRCS ${KWSYS_C_SRCS} ProcessWin32.c)
+SET_PROPERTY(SOURCE ProcessWin32.c APPEND PROPERTY COMPILE_DEFINITIONS
+  KWSYS_ENCODING_DEFAULT_CODEPAGE=${KWSYS_ENCODING_DEFAULT_CODEPAGE})
   ELSE()
 # Use the UNIX implementation.
 SET(KWSYS_C_SRCS ${KWSYS_C_SRCS} ProcessUNIX.c)
diff --git a/Source/kwsys/Process.h.in b/Source/kwsys/Process.h.in
index 96563a2..0c79b47 100644
--- a/Source/kwsys/Process.h.in
+++ b/Source/kwsys/Process.h.in
@@ -67,6 +67,7 @@
 # define kwsysProcess_Execute   kwsys_ns(Process_Execute)
 # define kwsysProcess_Disownkwsys_ns(Process_Disown)
 # define kwsysProcess_WaitForData   kwsys_ns(Process_WaitForData)
+# define kwsysProcess_DecodeTextOutput  
kwsys_ns(Process_DecodeTextOutput)
 # define kwsysProcess_Pipes_e   kwsys_ns(Process_Pipes_e)
 # define kwsysProcess_Pipe_None