On Sun, Feb 13, 2011 at 12:09:32AM +0000, Philip Martin wrote: > Philip Martin <philip.mar...@wandisco.com> writes: > > > Danny Trebbien <dtrebb...@gmail.com> writes: > > > >>> I had to look that up, it's Esperanto. Why do you want that one? What > >>> are you testing? It doesn't appear to be available on my machine. Do > >>> you just want a non-utf8 locale? > >> > >> Yes, I basically just want a non-UTF-8 locale. Which one do you suggest?
Danny, what is this new test for (sorry if this has been explained in some other thread -- I briefly checked and couldn't find any). And are you aware of utf8_tests.py and the related issue? http://subversion.tigris.org/issues/show_bug.cgi?id=2079 Will your new test have similar problems? Instead of printing a warning, I would suggest to skip the test if the locale cannot be configured. > > How about trying a list? en_US.8859-1 is widely available even if not > > actually installed. I don't know which non-utf8 locales are widely > > installed, personally I have en_GB.8859-1. We have several German devs > > so maybe de_DE.8859-1? Perhaps jp_JP.EUC-JP? > > I see utf8_tests.py is using ".1251" and "en_US.ISO8859-1". It isn't easy to find a locale that works on all platforms. Locales are poorly standardized -- a lot things, including names of locales, are left to the implementation. AFAIK the only thing that is standardized is the C99 API for using locales (setlocale() etc.). There are various implementations and they differ greatly in the amount of character encodings and languages provided. You'll mostly have to consider all operating systems that svn developers use, which is Windows (not sure if that even has a locale concept), various Linux distributions (some of which allow users to configure the set of available locales, so that they might only have e.g. en_US.UTF-8), Mac OS X, and all of the BSD systems. I don't think any of us is using Solaris or other proprietary UNIX systems. Though you should strive to allow your test to run there, too. Some users run the tests to verify their own binary builds, and those people might well be using proprietary UNIX. So if you must do this, please use some latin-1 locale. That's likely a good common denominator. If the language doesn't matter (it probably doesn't if all you want to test is encoding conversion), I'd recommend using en_US.ISO88591. And if that isn't there, just skip the test. Note that this may be called en_US.ISO8859-1 on some systems (e.g. BSD). Maybe there are other naming variations, not sure. The esperanto locale is definitely a bad idea. I would expect that none of the BSD systems have it, for instance.