[cmake-developers] [PATCH v2] Improve encoding handling on Windows
On Windows getenv (and putenv) uses ANSI codepage so it needs to be encoded to internally used encoding (eg. UTF-8). Here we use _wgetenv (and _wputenv) instead and encode that. Also typically Windows applications (eg. MSVC compiler) use current console's codepage for output to pipes so we need to encode that to internally used encoding (KWSYS_ENCODING_DEFAULT_CODEPAGE). Next, when we're outputing to console need to use wide functions. This change allows that compilers such as MSVC on Windows can be installed in non-ASCII path and will work correctly for all console's codepages which supports that path's characters. --- Source/CPack/cmCPackGenerator.cxx | 8 +-- Source/CTest/cmCTestCoverageHandler.cxx | 12 ++-- Source/CTest/cmCTestCurl.cxx| 21 +++ Source/CTest/cmCTestMultiProcessHandler.cxx | 8 +-- Source/cmBuildCommand.cxx | 23 +++ Source/cmCLocaleEnvironmentScope.cxx| 4 +- Source/cmCTest.cxx | 11 ++-- Source/cmCommandArgumentParserHelper.cxx| 8 +-- Source/cmConditionEvaluator.cxx | 2 +- Source/cmExportCommand.cxx | 5 +- Source/cmExtraEclipseCDT4Generator.cxx | 9 +-- Source/cmFileCommand.cxx| 12 ++-- Source/cmFindPackageCommand.cxx | 4 +- Source/cmGlobalVisualStudio7Generator.cxx | 6 +- Source/cmMakefile.cxx | 5 +- Source/cmNinjaTargetGenerator.cxx | 2 +- Source/cmQtAutoGenerators.cxx | 2 +- Source/cmSetCommand.cxx | 6 +- Source/cmState.cxx | 5 +- Source/cmSystemTools.cxx| 11 +++- Source/cmTimestamp.cxx | 7 +-- Source/cmake.cxx| 16 ++--- Source/cmakemain.cxx| 38 ++- Source/cmcmd.cxx| 12 ++-- Source/kwsys/CMakeLists.txt | 2 + Source/kwsys/Directory.cxx | 2 +- Source/kwsys/FStream.hxx.in | 19 +- Source/kwsys/ProcessWin32.c | 25 +++- Source/kwsys/SystemInformation.cxx | 20 +++--- Source/kwsys/SystemTools.cxx| 98 +++-- Source/kwsys/SystemTools.hxx.in | 4 +- Source/kwsys/testSystemTools.cxx| 9 +-- 32 files changed, 260 insertions(+), 156 deletions(-) diff --git a/Source/CPack/cmCPackGenerator.cxx b/Source/CPack/cmCPackGenerator.cxx index df8bb0f..76609e1 100644 --- a/Source/CPack/cmCPackGenerator.cxx +++ b/Source/CPack/cmCPackGenerator.cxx @@ -1074,11 +1074,11 @@ const char* cmCPackGenerator::GetInstallPath() return this->InstallPath.c_str(); } #if defined(_WIN32) && !defined(__CYGWIN__) - const char* prgfiles = cmsys::SystemTools::GetEnv("ProgramFiles"); - const char* sysDrive = cmsys::SystemTools::GetEnv("SystemDrive"); - if (prgfiles) { + std::string prgfiles; + std::string sysDrive; + if (cmsys::SystemTools::GetEnv("ProgramFiles", prgfiles)) { this->InstallPath = prgfiles; - } else if (sysDrive) { + } else if (cmsys::SystemTools::GetEnv("SystemDrive", sysDrive)) { this->InstallPath = sysDrive; this->InstallPath += "/Program Files"; } else { diff --git a/Source/CTest/cmCTestCoverageHandler.cxx b/Source/CTest/cmCTestCoverageHandler.cxx index 7102533..9410a52 100644 --- a/Source/CTest/cmCTestCoverageHandler.cxx +++ b/Source/CTest/cmCTestCoverageHandler.cxx @@ -727,10 +727,7 @@ int cmCTestCoverageHandler::HandleCoberturaCoverage( // if it doesn't exist or is empty, assume the // binary directory is used. std::string coverageXMLFile; - const char* covDir = cmSystemTools::GetEnv("COBERTURADIR"); - if (covDir && strlen(covDir) != 0) { -coverageXMLFile = std::string(covDir); - } else { + if (!cmSystemTools::GetEnv("COBERTURADIR", coverageXMLFile) || coverageXMLFile.empty()) { coverageXMLFile = this->CTest->GetBinaryDir(); } // build the find file string with the directory from above @@ -791,7 +788,8 @@ struct cmCTestCoverageHandlerLocale { cmCTestCoverageHandlerLocale() { -if (const char* l = cmSystemTools::GetEnv("LC_ALL")) { +std::string l; +if (cmSystemTools::GetEnv("LC_ALL", l)) { lc_all = l; } if (lc_all != "C") { @@ -2121,8 +2119,8 @@ int cmCTestCoverageHandler::RunBullseyeSourceSummary( int cmCTestCoverageHandler::HandleBullseyeCoverage( cmCTestCoverageHandlerContainer* cont) { - const char* covfile = cmSystemTools::GetEnv("COVFILE"); - if (!covfile || strlen(covfile) == 0) { + std::string covfile; + if (!cmSystemTools::GetEnv("COVFILE", covfile) || covfile.empty()) { cmCTestOptionalLog(this->CTest, HANDLER_VERBOSE_OUTPUT, " COVFILE environment variable not found, not running " " bullseye\n", diff --git a/Source/CTest/cmCTestCurl.cxx b/Source/CTest/cmCTestCurl.cxx index
Re: [cmake-developers] [PATCH] Improve encoding handling on Windows
2016-07-01 17:41 GMT+03:00 Mike Gelfand: > > Since you already have "bool SystemTools::GetEnv(const char* key, > std::string& result)", another option would be to use it everywhere and > maybe introduce something like "bool SystemTools::HasEnv(const char* > key)" for those several cases where you only need to check the existence. > > I implemented this, it's actually really nice API to work with. 2016-07-02 1:54 GMT+03:00 : > > ... > > > +#if defined(_WIN32) > > + // Kinda hack, with MSVC (and MinGW) for some reason std::wcout > > + // (and all other std::w*) fails once it encounters non-ASCII > > + // string unless locale is set. > > + // Note that even with this, seems it can't output characters > > + // which aren't present in ANSI codepage no matter what encoding > > + // is used for console. > > + // Also once any character outside of ANSI codepage is tried to > > + // be outputed then after there anymore won't be output from > > + // any of std::w* functions. > > + _wsetlocale(LC_ALL, L""); > > +#endif > > The _wsetlocale() may be a problem. > > See: > https://gitlab.kitware.com/cmake/cmake/commit/87be2e142 > https://cmake.org/Bug/view.php?id=16099 > > > Indeed, good catch, thanks, I didn't thought about this. But even now most these locale aware functions like tolower/toupper and others are wrong because internally we use UTF-8 and there 1 character can take more than 1 byte so these functions won't work correctly for some strings even if we don't set any locale. Now here we actually set it only on Windows because there just isn't any other way. Without setting locale we get only ASCII support and can't output even ANSI characters. With locale we can atleast output ANSI characters. Currently Microsoft C++ library doesn't support UTF-8/UTF-16 locales. Only way to output Unicode would be implement our own std::wcout which would use wide WinAPI to write to console. Anyway quick fix is to always use English locale then number parsing will be expected and set user's codepage. std::wstring locale = L"English_United States."; locale += std::to_wstring(GetACP()); _wsetlocale(LC_ALL, locale.c_str()); Of course proper Unicode support will be needed some day, but for now this is still an improvement. > @@ -30,8 +30,23 @@ namespace @KWSYS_NAMESPACE@ > > typedef std::basic_filebuf my_base_type; > > basic_filebuf *open(char const *s,std::ios_base::openmode mode) > > { > > +std::wstring wstr = Encoding::ToWide(s); > > +const wchar_t *ws = wstr.c_str(); > > +#if defined(_MSC_VER) && _MSC_VER >= 1400 > > +const wchar_t *ss = ws; > > +#else > > +const char *ss = 0; > > +size_t length = std::wcstombs(0, ws, 0); > > +if (length != size_t(-1)) { > > + std::vector str(length + 1); > > + ss = str.data(); > > + std::wcstombs((char *)ss, ws, str.size()); > > +} else { > > + ss = s; > > +} > > +#endif > > return static_cast ( > > - my_base_type::open(Encoding::ToWide(s).c_str(), mode) > > + my_base_type::open(ss, mode) > > ); > > Yes. It makes sense to convert from utf-8 to code page when we are not > using the wide apis. > This would at least give you support for the current code page, beyond > ascii. Beyond that, the wide functions should be used. > Which compiler are you trying to support here, which doesn't give a wide > open()? > > Only MSVC have ofstream::open(const wchar_t *) so for MinGW need to use ofstream::open(const char *) or use wofstream::open(const wchar_t *) which would require quite big changes. > > } > > }; > > diff --git a/Source/kwsys/ProcessWin32.c b/Source/kwsys/ProcessWin32.c > > index 2b93e69..208e725 100644 > > --- a/Source/kwsys/ProcessWin32.c > > +++ b/Source/kwsys/ProcessWin32.c > > @@ -181,7 +181,7 @@ struct kwsysProcessPipeData_s > > /* - Data managed per call to Execute - */ > > > > /* Buffer for data read in this pipe's thread. */ > > - char DataBuffer[KWSYSPE_PIPE_BUFFER_SIZE]; > > + char DataBuffer[KWSYSPE_PIPE_BUFFER_SIZE*2]; > > This "*2" assumes all characters cmake sees will fit in the Basic > Multilingual Plane. > Are we OK assuming that? > If the conversion from a code page to utf-16 results in surrogate pairs, > there may not be enough space. > > Not really, it assumes character string from ANSI code page encoded to internal encoding (UTF-8) will not take more than 2x space. Which I think is a pretty good guess, because here we're processing output from a process and so output will be paths and texts usually in ASCII with not many characters which would require more than 2 bytes. And it will still work if say 1/4th uses 4 bytes and 1 byte for rest 3/4th. 1024 * 1/4 * 4 bytes + 1024 * 3/4 * 1 byte (1792) < 1024 * 2 (2048) But of course can increase it to 4x, only I think it's very
[cmake-developers] Green Hills MULTI Generator Recommendations
I am interested in using CMake with Green Hills MULTI. I see that there is some experimental support for MULTI in CMake already, but that support appears to be tied to ARM and assumes the Green Hills INTEGRITY RTOS is also being used. I would also like to build for the x86 and Renesas V850 architectures. I have hacked in support for other targets just to get something working locally, but I could use some recommendations on how to proceed so I can submit patches to the CMake project. I have a couple of questions. 1. GHS MULTI supports multiple processor architectures, but each of the architectures use different compiler/linker names. As an example, ARM uses ccarm/cxarm but the v850 uses ccv850/cxv850. Should the MULTI generator provide a list of known compiler names by architecture, or should we rely on users to set the appropriate CMAKE__COMPILER variables in the toolchain file? If the generator should provide a list of known compiler names, would CMake's -A option be the appropriate way to specify the target processor's platform? Is there another compiler toolchain that might be a good template to copy from? 2. Is there a way in a toolchain file or on the command line to specify the directory where CMake should look for the compiler executables? I tried creating a variable in a toolchain file, but it appears that CMake tries to figure out the CMAKE__COMPILER value before the toolchain variable I created is available. 3. Is there a way to specify the make executable full path in the toolchain file? GHS MULTI uses their own program called gbuild to drive their builds. At least for our team, it would not be a good idea for us to put the gbuild executable in the PATH variable because we will be working with more than one architecture and we would need to specify exactly which gbuild executable to use - we wouldn't want the ARM gbuild to run when we are building for x86. I tried setting CMAKE_MAKE_PROGRAM in the toolchain file, but the CMAKE_MAKE_PROGRAM variable I set in the toolchain file does not appear to be available at the time FindMakeProgram() in the generator is called. -- Powered by www.kitware.com Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Kitware offers various services to support the CMake community. For more information on each offering, please visit: CMake Support: http://cmake.org/cmake/help/support.html CMake Consulting: http://cmake.org/cmake/help/consulting.html CMake Training Courses: http://cmake.org/cmake/help/training.html Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Follow this link to subscribe/unsubscribe: http://public.kitware.com/mailman/listinfo/cmake-developers