On Fri, Jan 20, 2023 at 05:16:43PM +0000, Simon McVittie wrote: > On Fri, 20 Jan 2023 at 09:54:21 -0700, Anthony Fok wrote: > > supposedly some older Chinese websites are still using "GBK" as > > encoding, probably something like: > > > > <meta http-equiv="Content-Type" content="text/html;charset=gbk"> > > > > which has less than 30,000 characters and thus a very limited subset > > of Unicode. And, presumably not everyone has the know how to convert > > to UTF-8, the Chinese government wants those unable to at least change > > that meta tag to: > > > > <meta http-equiv="Content-Type" content="text/html;charset=gb18030"> > > Sure, but neither of those actually require us to support GBK or GB > 18030 as a system locale, only as something that iconv() (or whatever > browsers actually use, which is probably their own thing) can convert > into their preferred internal representation (which is almost certainly > UTF-8, UTF-16 or UCS-4).
Those files need to be edited *somewhere*. If that somewhere is a Debian desktop, then you also need editors that know how to write such files, etc. Sometimes it's just easier if the whole thing uses the same encoding. > Analogously, we've never supported using Windows-1252 (Microsoft's > legacy Latin-1 variant) as a system locale encoding in some hypothetical > locale like en_US.windows-1252, but HTML documents with > text/html;charset=windows-1252 still work fine. Windows-* encodings were native on Windows, and we only needed to be able to read files that were generated on such systems. We're talking here instead about a government-mandated encoding that systems are expected to support; not only to consume data, but also to *produce* data. Windows-* encodings never had that attached to them. -- w@uter.{be,co.za} wouter@{grep.be,fosdem.org,debian.org} I will have a Tin-Actinium-Potassium mixture, thanks.