On Wed, Jan 18, 2023 at 6:30 PM Russ Allbery <r...@debian.org> wrote:
>
> Anthony Fok <f...@debian.org> writes:
>
> > I'm not asking you to spend any time working on GB18030; that would be
> > the job of Debian Chinese i18n/L10n team as well as the wider community
> > (glibc, libiconv, Qt, etc.)  All I am asking you is to maintain the
> > status quo, and don't discount anything other than UTF-8 as legacy.
>
> This topic comes up a lot, and I'd love to put something in either Policy
> or the Developer's Reference proactively to at least explain what we know
> about what our users need and to point people at the right questions to
> ask if it's been another decade and they want to standardize on UTF-8
> again.  Do you have an idea of something suitable we should say?

Hey Russ, thank you so much for your message!

Adam, I would like to apologize; while I still value that Debian
maintains its existing support for zh_CN.GB18030 locale, I did speak a
bit too soon.  I'll elaborate.

> I do think we probably should say more *somewhere* about making UTF-8 the
> default choice in most situations if you otherwise have no reason to
> choose anything specific.

I totally agree.  Besides the Debian Policy A fellow DD on #debian-zh
IRC (linked with Telegram) channel suggests that UTF-8 being the
default should be mentioned in the Release Notes and probably with
pointers to fuller documentation, with instructions on how to manually
add locales with legacy and other non-UTF-8 encodings edit
/etc/locale.gen and /etc/default/locale, and run locale-gen.

> For example, as you point out, files written in
> Chinese for Chinese people may or may not want to use UTF-8, but at this
> point I do think anything written in, say, French or German probably
> should just use UTF-8.

Totally agreed.

And I should clarify: Actually, I would say, for the majority of end
users in Mainland China, zh_CN.UTF-8 would still be the best default,
though likely some government and financial institutions may require
the use of zh_CN.GB18030 probably for certain terminal applications.
I don't know the percentage though.

I asked around #debian-zh last night for more feedback, and most
existing users/developers definitely prefer UTF-8 and are using
zh_CN.UTF-8.  Some joked that those who choose zh_CN.GB18030 are the
ones who like to create difficulties for themselves.

And while support for zh_CN.GB18030 as a "system locale" was
apparently a requirement for conformance testing for GB 18030-2000
some twenty years ago — I went through that period personally when
there was a mad dash by all Linux vendors to get that as well as fonts
and input methods working properly — fellow Chinese DDs agree that
could be a requirement 20 years ago, but no longer today, and suggest
that all China homegrown nowadays use LANG=zh_CN.UTF-8 by default, and
apparently still pass the GB 18030(-2005?) conformance tests.  They
suggest that probably having the ability to read and write
GB18030-encoded documents, and being able to convert between UTF-8 and
GB18030 etc. should be sufficient.

I was initially unconvinced, but then after testing in virtual machine
various ISO images from latest releases of China homegrown Linux
distributions, e.g. Deepin Linux, openKylin, and even Red Flag Desktop
Linux, and they all use zh_CN.UTF-8 as the default system locale!
(Red Flag does have zh_CN.gb18030 locale precompiled though, but then
it seems to have all available locales precompiled according to
"locale -a".

Incidentally, Red Flag Desktop Linux is now based on Debian too!  They
used to co-develop the RHEL-based Asianux on which they built their
distro.  What a pleasant surprise!

> Also, file names in the file system shipped in
> Debian packages probably should use UTF-8 since there's no way to declare
> the character set and there are some solid reasons for picking one and
> sticking with it.  (Obviously, users can create files with any character
> set they want.)

Great point!  Totally agreed

> > Debian already supports GB 18030-2000 (or GB 18030-2005) rather well.
>
> How do I configure a locale that uses this as the default character set?
> I'd like to be able to test this configuration (at least for my own
> packages), but since recent changes to locales it doesn't appear to be an
> option in debconf and I was confused trying to figure out how I should
> make it work.

Good question!  I somehow missed that removal of "legacy" encodings
from the locales dpkg configure menu... so that's why Adam was saying
official support for legacy locales have indeed been dropped. (Thanks
Adam!  You're just speaking the facts.)

Anyhow, to test how Debian and various desktop environments run under
zh_CN.GB18030 as system locale, here are the steps:

1. Create the /usr/local/share/i18n/SUPPORTED file with the line

        zh_CN.GB18030 GB18030

    (I actually started by prepending that line before "zh_CN.UTF-8 UTF-8"
     in /var/lib/dpkg/info/locales.config, but then saw that it has
provision for
     user-provided list of locale(s).)

 2. Run "sudo dpkg-reconfigure locales" and you'll be able to select
     zh_CN.GB18030 and set it as the default locale.

 3. Optionally, edit /etc/default/locale and make sure you have
      LANGUAGE=en or something similar so you can still see the
      UI in English.

 4. Reboot.

Alternatively, in lieu of running "dpkg-reconfigure locales", you may
also manually edit /etc/locale.gen, uncomment the line

    # zh_CN.GB18030 GB18030

therein, and run "sudo locale-gen".

And I went ahead to test to see how Debian runs under zh_CN.GB18030 as
the system locale with various desktop environments.

The result:
 * Crash upon starting: GNOME 43 and XFCE (Ouch!)
 * KDE, LXDE, LXQt, Cinnamon, MATE: Start up normally.

As for terminals:
 * GNOME Terminal: Crash
 * Console (kgx), Terminator: Do not crash but support UTF-8 only
 * LX Terminal: Follows LANG setting and seemingly supports GB18030 fully
 * Konsole: Full support for GB18030 and any other encodings

In conclusion:

Initially, before asking on #debian-zh and doing all the testing, I
was going to suggest adding the "zh_CN.GB18030" back in the locales
configuration so that at least the GB18030 conformance testers can
easily choose it and let Debian pass the test.

However, after seeing how GNOME 43 crashes under zh_CN.GB18030, and
how China homegrown Linux distros have all switched to using
zh_CN.UTF-8 as the default system locale, I am starting to believe
that setting zh_CN.GB18030 as the system locale is not a requirement
for the GB 18030-2022 conformance tests (as my friends on #debian-zh
were trying to tell me), so I am going to do a full 180° and think
what we have now in Debian's locales package is perfect.  (Maybe some
of the menu text may need changing as there are no "legacy encodings"
to choose from.)

So, all is good!  Well, I hope that GNOME 43 and XFCE crashing upon
startup could be diagnosed and fixed, preferably by the upstream
authors, but there is no urgency to do so now.

I apologize for the confusions that I created.

Cheers,

Anthony

Reply via email to