On Wed, Jan 18, 2023 at 6:30 PM Russ Allbery <r...@debian.org> wrote: > > Anthony Fok <f...@debian.org> writes: > > > I'm not asking you to spend any time working on GB18030; that would be > > the job of Debian Chinese i18n/L10n team as well as the wider community > > (glibc, libiconv, Qt, etc.) All I am asking you is to maintain the > > status quo, and don't discount anything other than UTF-8 as legacy. > > This topic comes up a lot, and I'd love to put something in either Policy > or the Developer's Reference proactively to at least explain what we know > about what our users need and to point people at the right questions to > ask if it's been another decade and they want to standardize on UTF-8 > again. Do you have an idea of something suitable we should say?
Hey Russ, thank you so much for your message! Adam, I would like to apologize; while I still value that Debian maintains its existing support for zh_CN.GB18030 locale, I did speak a bit too soon. I'll elaborate. > I do think we probably should say more *somewhere* about making UTF-8 the > default choice in most situations if you otherwise have no reason to > choose anything specific. I totally agree. Besides the Debian Policy A fellow DD on #debian-zh IRC (linked with Telegram) channel suggests that UTF-8 being the default should be mentioned in the Release Notes and probably with pointers to fuller documentation, with instructions on how to manually add locales with legacy and other non-UTF-8 encodings edit /etc/locale.gen and /etc/default/locale, and run locale-gen. > For example, as you point out, files written in > Chinese for Chinese people may or may not want to use UTF-8, but at this > point I do think anything written in, say, French or German probably > should just use UTF-8. Totally agreed. And I should clarify: Actually, I would say, for the majority of end users in Mainland China, zh_CN.UTF-8 would still be the best default, though likely some government and financial institutions may require the use of zh_CN.GB18030 probably for certain terminal applications. I don't know the percentage though. I asked around #debian-zh last night for more feedback, and most existing users/developers definitely prefer UTF-8 and are using zh_CN.UTF-8. Some joked that those who choose zh_CN.GB18030 are the ones who like to create difficulties for themselves. And while support for zh_CN.GB18030 as a "system locale" was apparently a requirement for conformance testing for GB 18030-2000 some twenty years ago — I went through that period personally when there was a mad dash by all Linux vendors to get that as well as fonts and input methods working properly — fellow Chinese DDs agree that could be a requirement 20 years ago, but no longer today, and suggest that all China homegrown nowadays use LANG=zh_CN.UTF-8 by default, and apparently still pass the GB 18030(-2005?) conformance tests. They suggest that probably having the ability to read and write GB18030-encoded documents, and being able to convert between UTF-8 and GB18030 etc. should be sufficient. I was initially unconvinced, but then after testing in virtual machine various ISO images from latest releases of China homegrown Linux distributions, e.g. Deepin Linux, openKylin, and even Red Flag Desktop Linux, and they all use zh_CN.UTF-8 as the default system locale! (Red Flag does have zh_CN.gb18030 locale precompiled though, but then it seems to have all available locales precompiled according to "locale -a". Incidentally, Red Flag Desktop Linux is now based on Debian too! They used to co-develop the RHEL-based Asianux on which they built their distro. What a pleasant surprise! > Also, file names in the file system shipped in > Debian packages probably should use UTF-8 since there's no way to declare > the character set and there are some solid reasons for picking one and > sticking with it. (Obviously, users can create files with any character > set they want.) Great point! Totally agreed > > Debian already supports GB 18030-2000 (or GB 18030-2005) rather well. > > How do I configure a locale that uses this as the default character set? > I'd like to be able to test this configuration (at least for my own > packages), but since recent changes to locales it doesn't appear to be an > option in debconf and I was confused trying to figure out how I should > make it work. Good question! I somehow missed that removal of "legacy" encodings from the locales dpkg configure menu... so that's why Adam was saying official support for legacy locales have indeed been dropped. (Thanks Adam! You're just speaking the facts.) Anyhow, to test how Debian and various desktop environments run under zh_CN.GB18030 as system locale, here are the steps: 1. Create the /usr/local/share/i18n/SUPPORTED file with the line zh_CN.GB18030 GB18030 (I actually started by prepending that line before "zh_CN.UTF-8 UTF-8" in /var/lib/dpkg/info/locales.config, but then saw that it has provision for user-provided list of locale(s).) 2. Run "sudo dpkg-reconfigure locales" and you'll be able to select zh_CN.GB18030 and set it as the default locale. 3. Optionally, edit /etc/default/locale and make sure you have LANGUAGE=en or something similar so you can still see the UI in English. 4. Reboot. Alternatively, in lieu of running "dpkg-reconfigure locales", you may also manually edit /etc/locale.gen, uncomment the line # zh_CN.GB18030 GB18030 therein, and run "sudo locale-gen". And I went ahead to test to see how Debian runs under zh_CN.GB18030 as the system locale with various desktop environments. The result: * Crash upon starting: GNOME 43 and XFCE (Ouch!) * KDE, LXDE, LXQt, Cinnamon, MATE: Start up normally. As for terminals: * GNOME Terminal: Crash * Console (kgx), Terminator: Do not crash but support UTF-8 only * LX Terminal: Follows LANG setting and seemingly supports GB18030 fully * Konsole: Full support for GB18030 and any other encodings In conclusion: Initially, before asking on #debian-zh and doing all the testing, I was going to suggest adding the "zh_CN.GB18030" back in the locales configuration so that at least the GB18030 conformance testers can easily choose it and let Debian pass the test. However, after seeing how GNOME 43 crashes under zh_CN.GB18030, and how China homegrown Linux distros have all switched to using zh_CN.UTF-8 as the default system locale, I am starting to believe that setting zh_CN.GB18030 as the system locale is not a requirement for the GB 18030-2022 conformance tests (as my friends on #debian-zh were trying to tell me), so I am going to do a full 180° and think what we have now in Debian's locales package is perfect. (Maybe some of the menu text may need changing as there are no "legacy encodings" to choose from.) So, all is good! Well, I hope that GNOME 43 and XFCE crashing upon startup could be diagnosed and fixed, preferably by the upstream authors, but there is no urgency to do so now. I apologize for the confusions that I created. Cheers, Anthony