>> >> Yes, I basically just want a non-UTF-8 locale. Which one do you suggest? > > Danny, what is this new test for (sorry if this has been explained in > some other thread -- I briefly checked and couldn't find any).
Hi Stefan, This is for a new test. See: http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125782 > And are you aware of utf8_tests.py and the related issue? > http://subversion.tigris.org/issues/show_bug.cgi?id=2079 > Will your new test have similar problems? I was not aware of this issue, but I don't think that my new test has the same problem. For one thing, the definition of the variable `i18n_filename` in `subversion/tests/cmdline/utf8_tests.py` and the way that it is used assumes that the system-default character encoding is ISO-8859-1. If it isn't, then the `i18n_filename` either is invalid in the current encoding or represents a different sequence of characters entirely. In my test, I was careful to pick a byte sequence (0xc6) that represents the same character (Æ) when viewed as a CP-1252 or ISO-8859-1 encoded string. Also, the test sets the locale to use either of those two encodings, skipping the test if it is unable to do so. > Instead of printing a warning, I would suggest to skip the test if > the locale cannot be configured. That's what it does. >> > How about trying a list? en_US.8859-1 is widely available even if not >> > actually installed. I don't know which non-utf8 locales are widely >> > installed, personally I have en_GB.8859-1. We have several German devs >> > so maybe de_DE.8859-1? Perhaps jp_JP.EUC-JP? >> >> I see utf8_tests.py is using ".1251" and "en_US.ISO8859-1". > > It isn't easy to find a locale that works on all platforms. > > Locales are poorly standardized -- a lot things, including names of locales, > are left to the implementation. AFAIK the only thing that is standardized is > the C99 API for using locales (setlocale() etc.). There are various > implementations and they differ greatly in the amount of character > encodings and languages provided. > > You'll mostly have to consider all operating systems that svn developers > use, which is Windows (not sure if that even has a locale concept), > various Linux distributions (some of which allow users to configure the > set of available locales, so that they might only have e.g. en_US.UTF-8), > Mac OS X, and all of the BSD systems. > > I don't think any of us is using Solaris or other proprietary UNIX systems. > Though you should strive to allow your test to run there, too. > Some users run the tests to verify their own binary builds, and those > people might well be using proprietary UNIX. > > So if you must do this, please use some latin-1 locale. > That's likely a good common denominator. > If the language doesn't matter (it probably doesn't if all you want to > test is encoding conversion), I'd recommend using en_US.ISO88591. > And if that isn't there, just skip the test. > Note that this may be called en_US.ISO8859-1 on some systems (e.g. BSD). > Maybe there are other naming variations, not sure. > > The esperanto locale is definitely a bad idea. I would expect that > none of the BSD systems have it, for instance. Per Philip's suggestion, I have switched the test to trying a list of locales. While I have not tested it on Windows, I am using examples from MSDN verbatim. It should work. I also try the Windows-specific locale strings first because I know that a Linux system successfully ignores them. Using the following, for example, the locale ends up being en_US.ISO-8859-1 on my Debian Linux "Squeeze" system supporting locales C, en_US, en_US.iso88591, en_US.utf8, eo, eo.iso88593, eo.utf8, and POSIX: if ((! setlocale(LC_ALL, "English.1252")) && (! setlocale(LC_ALL, "German.1252")) && (! setlocale(LC_ALL, "French.1252")) && (! setlocale(LC_ALL, "en_US.ISO-8859-1")) && (! setlocale(LC_ALL, "en_GB.ISO-8859-1")) && (! setlocale(LC_ALL, "de_DE.ISO-8859-1"))) return svn_error_createf(SVN_ERR_TEST_SKIPPED, NULL, "None of the locales English.1252, German.1252, French.1252, en_US.ISO-8859-1, en_GB.ISO-8859-1, and de_DE.ISO-8859-1 are installed.");