[cmake-developers] [PATCH v2] Improve encoding handling on Windows

2016-07-02 Thread Dāvis Mosāns
On Windows getenv (and putenv) uses ANSI codepage so it needs to
be encoded to internally used encoding (eg. UTF-8). Here we use
_wgetenv (and _wputenv) instead and encode that.

Also typically Windows applications (eg. MSVC compiler) use current
console's codepage for output to pipes so we need to encode that
to internally used encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE).

Next, when we're outputing to console need to use wide functions.

This change allows that compilers such as MSVC on Windows can be
installed in non-ASCII path and will work correctly for all
console's codepages which supports that path's characters.
---
 Source/CPack/cmCPackGenerator.cxx   |  8 +--
 Source/CTest/cmCTestCoverageHandler.cxx | 12 ++--
 Source/CTest/cmCTestCurl.cxx| 21 +++
 Source/CTest/cmCTestMultiProcessHandler.cxx |  8 +--
 Source/cmBuildCommand.cxx   | 23 +++
 Source/cmCLocaleEnvironmentScope.cxx|  4 +-
 Source/cmCTest.cxx  | 11 ++--
 Source/cmCommandArgumentParserHelper.cxx|  8 +--
 Source/cmConditionEvaluator.cxx |  2 +-
 Source/cmExportCommand.cxx  |  5 +-
 Source/cmExtraEclipseCDT4Generator.cxx  |  9 +--
 Source/cmFileCommand.cxx| 12 ++--
 Source/cmFindPackageCommand.cxx |  4 +-
 Source/cmGlobalVisualStudio7Generator.cxx   |  6 +-
 Source/cmMakefile.cxx   |  5 +-
 Source/cmNinjaTargetGenerator.cxx   |  2 +-
 Source/cmQtAutoGenerators.cxx   |  2 +-
 Source/cmSetCommand.cxx |  6 +-
 Source/cmState.cxx  |  5 +-
 Source/cmSystemTools.cxx| 11 +++-
 Source/cmTimestamp.cxx  |  7 +--
 Source/cmake.cxx| 16 ++---
 Source/cmakemain.cxx| 38 ++-
 Source/cmcmd.cxx| 12 ++--
 Source/kwsys/CMakeLists.txt |  2 +
 Source/kwsys/Directory.cxx  |  2 +-
 Source/kwsys/FStream.hxx.in | 19 +-
 Source/kwsys/ProcessWin32.c | 25 +++-
 Source/kwsys/SystemInformation.cxx  | 20 +++---
 Source/kwsys/SystemTools.cxx| 98 +++--
 Source/kwsys/SystemTools.hxx.in |  4 +-
 Source/kwsys/testSystemTools.cxx|  9 +--
 32 files changed, 260 insertions(+), 156 deletions(-)

diff --git a/Source/CPack/cmCPackGenerator.cxx 
b/Source/CPack/cmCPackGenerator.cxx
index df8bb0f..76609e1 100644
--- a/Source/CPack/cmCPackGenerator.cxx
+++ b/Source/CPack/cmCPackGenerator.cxx
@@ -1074,11 +1074,11 @@ const char* cmCPackGenerator::GetInstallPath()
 return this->InstallPath.c_str();
   }
 #if defined(_WIN32) && !defined(__CYGWIN__)
-  const char* prgfiles = cmsys::SystemTools::GetEnv("ProgramFiles");
-  const char* sysDrive = cmsys::SystemTools::GetEnv("SystemDrive");
-  if (prgfiles) {
+  std::string prgfiles;
+  std::string sysDrive;
+  if (cmsys::SystemTools::GetEnv("ProgramFiles", prgfiles)) {
 this->InstallPath = prgfiles;
-  } else if (sysDrive) {
+  } else if (cmsys::SystemTools::GetEnv("SystemDrive", sysDrive)) {
 this->InstallPath = sysDrive;
 this->InstallPath += "/Program Files";
   } else {
diff --git a/Source/CTest/cmCTestCoverageHandler.cxx 
b/Source/CTest/cmCTestCoverageHandler.cxx
index 7102533..9410a52 100644
--- a/Source/CTest/cmCTestCoverageHandler.cxx
+++ b/Source/CTest/cmCTestCoverageHandler.cxx
@@ -727,10 +727,7 @@ int cmCTestCoverageHandler::HandleCoberturaCoverage(
   // if it doesn't exist or is empty, assume the
   // binary directory is used.
   std::string coverageXMLFile;
-  const char* covDir = cmSystemTools::GetEnv("COBERTURADIR");
-  if (covDir && strlen(covDir) != 0) {
-coverageXMLFile = std::string(covDir);
-  } else {
+  if (!cmSystemTools::GetEnv("COBERTURADIR", coverageXMLFile) || 
coverageXMLFile.empty()) {
 coverageXMLFile = this->CTest->GetBinaryDir();
   }
   // build the find file string with the directory from above
@@ -791,7 +788,8 @@ struct cmCTestCoverageHandlerLocale
 {
   cmCTestCoverageHandlerLocale()
   {
-if (const char* l = cmSystemTools::GetEnv("LC_ALL")) {
+std::string l;
+if (cmSystemTools::GetEnv("LC_ALL", l)) {
   lc_all = l;
 }
 if (lc_all != "C") {
@@ -2121,8 +2119,8 @@ int cmCTestCoverageHandler::RunBullseyeSourceSummary(
 int cmCTestCoverageHandler::HandleBullseyeCoverage(
   cmCTestCoverageHandlerContainer* cont)
 {
-  const char* covfile = cmSystemTools::GetEnv("COVFILE");
-  if (!covfile || strlen(covfile) == 0) {
+  std::string covfile;
+  if (!cmSystemTools::GetEnv("COVFILE", covfile) || covfile.empty()) {
 cmCTestOptionalLog(this->CTest, HANDLER_VERBOSE_OUTPUT,
" COVFILE environment variable not found, not running "
" bullseye\n",
diff --git a/Source/CTest/cmCTestCurl.cxx b/Source/CTest/cmCTestCurl.cxx
index 

Re: [cmake-developers] [PATCH] Improve encoding handling on Windows

2016-07-02 Thread Dāvis Mosāns
2016-07-01 17:41 GMT+03:00 Mike Gelfand :

>
> Since you already have "bool SystemTools::GetEnv(const char* key,
> std::string& result)", another option would be to use it everywhere and
> maybe introduce something like "bool SystemTools::HasEnv(const char*
> key)" for those several cases where you only need to check the existence.
>
>
I implemented this, it's actually really nice API to work with.


2016-07-02 1:54 GMT+03:00 :

>
> ...
>
> > +#if defined(_WIN32)
> > +  // Kinda hack, with MSVC (and MinGW) for some reason std::wcout
> > +  // (and all other std::w*) fails once it encounters non-ASCII
> > +  // string unless locale is set.
> > +  // Note that even with this, seems it can't output characters
> > +  // which aren't present in ANSI codepage no matter what encoding
> > +  // is used for console.
> > +  // Also once any character outside of ANSI codepage is tried to
> > +  // be outputed then after there anymore won't be output from
> > +  // any of std::w* functions.
> > +  _wsetlocale(LC_ALL, L"");
> > +#endif
>
> The _wsetlocale() may be a problem.
>
> See:
> https://gitlab.kitware.com/cmake/cmake/commit/87be2e142
> https://cmake.org/Bug/view.php?id=16099
>
>
>
Indeed, good catch, thanks, I didn't thought about this. But even now
most these locale aware functions like tolower/toupper and others are
wrong because internally we use UTF-8 and there 1 character can take
more than 1 byte so these functions won't work correctly for some strings
even if we don't set any locale.
Now here we actually set it only on Windows because there just isn't any
other way. Without setting locale we get only ASCII support and can't
output even ANSI characters. With locale we can atleast output ANSI
characters.
Currently Microsoft C++ library doesn't support UTF-8/UTF-16 locales.
Only way to output Unicode would be implement our own std::wcout which
would use wide WinAPI to write to console.

Anyway quick fix is to always use English locale then number parsing
will be expected and set user's codepage.

  std::wstring locale = L"English_United States.";
  locale += std::to_wstring(GetACP());
  _wsetlocale(LC_ALL, locale.c_str());


Of course proper Unicode support will be needed some day, but for now
this is still an improvement.


> @@ -30,8 +30,23 @@ namespace @KWSYS_NAMESPACE@
> >   typedef std::basic_filebuf my_base_type;
> >   basic_filebuf *open(char const *s,std::ios_base::openmode mode)
> >   {
> > +std::wstring wstr = Encoding::ToWide(s);
> > +const wchar_t *ws = wstr.c_str();
> > +#if defined(_MSC_VER) && _MSC_VER >= 1400
> > +const wchar_t *ss = ws;
> > +#else
> > +const char *ss = 0;
> > +size_t length = std::wcstombs(0, ws, 0);
> > +if (length != size_t(-1)) {
> > +  std::vector str(length + 1);
> > +  ss = str.data();
> > +  std::wcstombs((char *)ss, ws, str.size());
> > +} else {
> > +  ss = s;
> > +}
> > +#endif
> > return static_cast(
> > -  my_base_type::open(Encoding::ToWide(s).c_str(), mode)
> > +  my_base_type::open(ss, mode)
> >   );
>
> Yes.  It makes sense to convert from utf-8 to code page when we are not
> using the wide apis.
> This would at least give you support for the current code page, beyond
> ascii.   Beyond that, the wide functions should be used.
> Which compiler are you trying to support here, which doesn't give a wide
> open()?
>
>
Only MSVC have ofstream::open(const wchar_t *) so for MinGW need to use
ofstream::open(const char *) or use wofstream::open(const wchar_t *) which
would
require quite big changes.


> >   }
> >   };
> > diff --git a/Source/kwsys/ProcessWin32.c b/Source/kwsys/ProcessWin32.c
> > index 2b93e69..208e725 100644
> > --- a/Source/kwsys/ProcessWin32.c
> > +++ b/Source/kwsys/ProcessWin32.c
> > @@ -181,7 +181,7 @@ struct kwsysProcessPipeData_s
> >   /* - Data managed per call to Execute - */
> >
> >   /* Buffer for data read in this pipe's thread.  */
> > -  char DataBuffer[KWSYSPE_PIPE_BUFFER_SIZE];
> > +  char DataBuffer[KWSYSPE_PIPE_BUFFER_SIZE*2];
>
> This "*2" assumes all characters cmake sees will fit in the Basic
> Multilingual Plane.
> Are we OK assuming that?
> If the conversion from a code page to utf-16 results in surrogate pairs,
> there may not be enough space.
>
>
Not really, it assumes character string from ANSI code page encoded
to internal encoding (UTF-8) will not take more than 2x space. Which I
think is a pretty good guess, because here we're processing output
from a process and so output will be paths and texts usually in
ASCII with not many characters which would require more than 2 bytes.
And it will still work if say 1/4th uses 4 bytes and 1 byte for rest 3/4th.

1024 * 1/4 * 4 bytes + 1024 * 3/4 * 1 byte (1792) < 1024 * 2 (2048)

But of course can increase it to 4x, only I think it's very 

[cmake-developers] Green Hills MULTI Generator Recommendations

2016-07-02 Thread Chris Bux
I am interested in using CMake with Green Hills MULTI.  I see that there is
some experimental support for MULTI in CMake already, but that support
appears to be tied to ARM and assumes the Green Hills INTEGRITY RTOS is
also being used.  I would also like to build for the x86 and Renesas V850
architectures.

I have hacked in support for other targets just to get something working
locally, but I could use some recommendations on how to proceed so I can
submit patches to the CMake project.

I have a couple of questions.

1. GHS MULTI supports multiple processor architectures, but each of the
architectures use different compiler/linker names.  As an example, ARM uses
ccarm/cxarm but the v850 uses ccv850/cxv850.  Should the MULTI generator
provide a list of known compiler names by architecture, or should we rely
on users to set the appropriate CMAKE__COMPILER variables in the
toolchain file? If the generator should provide a list of known compiler
names, would CMake's -A option be the appropriate way to specify the target
processor's platform?  Is there another compiler toolchain that might be a
good template to copy from?

2. Is there a way in a toolchain file or on the command line to specify the
directory where CMake should look for the compiler executables?  I tried
creating a variable in a toolchain file, but it appears that CMake tries to
figure out the CMAKE__COMPILER value before the toolchain variable I
created is available.

3. Is there a way to specify the make executable full path in the toolchain
file?  GHS MULTI uses their own program called gbuild to drive their
builds.  At least for our team, it would not be a good idea for us to put
the gbuild executable in the PATH variable because we will be working with
more than one architecture and we would need to specify exactly which
gbuild executable to use - we wouldn't want the ARM gbuild to run when we
are building for x86.  I tried setting CMAKE_MAKE_PROGRAM in the toolchain
file, but the CMAKE_MAKE_PROGRAM variable I set in the toolchain file does
not appear to be available at the time FindMakeProgram() in the generator
is called.
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers