>> >> Yes, I basically just want a non-UTF-8 locale.  Which one do you suggest?
>
> Danny, what is this new test for (sorry if this has been explained in
> some other thread -- I briefly checked and couldn't find any).

Hi Stefan,

This is for a new test.  See:
http://thread.gmane.org/gmane.comp.version-control.subversion.devel/125782

> And are you aware of utf8_tests.py and the related issue?
> http://subversion.tigris.org/issues/show_bug.cgi?id=2079
> Will your new test have similar problems?

I was not aware of this issue, but I don't think that my new test has
the same problem.  For one thing, the definition of the variable
`i18n_filename` in `subversion/tests/cmdline/utf8_tests.py` and the
way that it is used assumes that the system-default character encoding
is ISO-8859-1.  If it isn't, then the `i18n_filename` either is
invalid in the current encoding or represents a different sequence of
characters entirely.

In my test, I was careful to pick a byte sequence (0xc6) that
represents the same character (Æ) when viewed as a CP-1252 or
ISO-8859-1 encoded string.  Also, the test sets the locale to use
either of those two encodings, skipping the test if it is unable to do
so.

> Instead of printing a warning, I would suggest to skip the test if
> the locale cannot be configured.

That's what it does.

>> > How about trying a list?  en_US.8859-1 is widely available even if not
>> > actually installed.  I don't know which non-utf8 locales are widely
>> > installed, personally I have en_GB.8859-1.  We have several German devs
>> > so maybe de_DE.8859-1?  Perhaps jp_JP.EUC-JP?
>>
>> I see utf8_tests.py is using ".1251" and "en_US.ISO8859-1".
>
> It isn't easy to find a locale that works on all platforms.
>
> Locales are poorly standardized -- a lot things, including names of locales,
> are left to the implementation. AFAIK the only thing that is standardized is
> the C99 API for using locales (setlocale() etc.). There are various
> implementations and they differ greatly in the amount of character
> encodings and languages provided.
>
> You'll mostly have to consider all operating systems that svn developers
> use, which is Windows (not sure if that even has a locale concept),
> various Linux distributions (some of which allow users to configure the
> set of available locales, so that they might only have e.g. en_US.UTF-8),
> Mac OS X, and all of the BSD systems.
>
> I don't think any of us is using Solaris or other proprietary UNIX systems.
> Though you should strive to allow your test to run there, too.
> Some users run the tests to verify their own binary builds, and those
> people might well be using proprietary UNIX.
>
> So if you must do this, please use some latin-1 locale.
> That's likely a good common denominator.
> If the language doesn't matter (it probably doesn't if all you want to
> test is encoding conversion), I'd recommend using en_US.ISO88591.
> And if that isn't there, just skip the test.
> Note that this may be called en_US.ISO8859-1 on some systems (e.g. BSD).
> Maybe there are other naming variations, not sure.
>
> The esperanto locale is definitely a bad idea. I would expect that
> none of the BSD systems have it, for instance.

Per Philip's suggestion, I have switched the test to trying a list of
locales.  While I have not tested it on Windows, I am using examples
from MSDN verbatim.  It should work.

I also try the Windows-specific locale strings first because I know
that a Linux system successfully ignores them.  Using the following,
for example, the locale ends up being en_US.ISO-8859-1 on my Debian
Linux "Squeeze" system supporting locales C, en_US, en_US.iso88591,
en_US.utf8, eo, eo.iso88593, eo.utf8, and POSIX:

   if ((! setlocale(LC_ALL, "English.1252")) &&
       (! setlocale(LC_ALL, "German.1252")) &&
       (! setlocale(LC_ALL, "French.1252")) &&
       (! setlocale(LC_ALL, "en_US.ISO-8859-1")) &&
       (! setlocale(LC_ALL, "en_GB.ISO-8859-1")) &&
       (! setlocale(LC_ALL, "de_DE.ISO-8859-1")))
     return svn_error_createf(SVN_ERR_TEST_SKIPPED, NULL, "None of the
locales English.1252, German.1252, French.1252, en_US.ISO-8859-1,
en_GB.ISO-8859-1, and de_DE.ISO-8859-1 are installed.");

Reply via email to