Hi Sorry, it looks like this thread is not progressing in a calm and reasoned manner, the way it was meant to be. And I'm very much to blame. So I apologise for the strong language and passionate opinions. I'm deleting most of what I had written as a reply so we can start over.
Let's start with your questions: On Saturday, 16 November 2019 10:50:13 PST André Pönitz wrote: > You have not yet answered > > - why this decision was made You know, I don't know. To be frank, I don't know that a decision *was* made. It all started with a change (see OP) about removing QTextCodec from the API and from QtCore. It seemed reasonable enough but it turned up quite a few kinks that hadn't been predicted. One of them, which may still be a showstopper, is QXmlStreamReader's inability to handle XML data encoded in anything except UTF-8, though a thorough search of all XML files in my system turned up exactly zero such files. I don't know why QTextCodec is being removed. I don't remember any decisions in prior QtCS or this mailing list about removing it. We definitely discussed removing the CJK codecs and their big tables and that can still be done, with no effect in the API, since QTextCodec is backed by ICU's ucnv. We may have discussed removing it, but I don't remember a firm decision. And even if it is firm, after looking at the consequences of doing so, we may want to reverse our decision. Related to that is the discussion of whether UTF-8 is the only acceptable locale on Unix systems. If we don't have QTextCodec, then we have to have something fixed for QString::fromLocal8Bit and it would necessarily be UTF-8. But even if we do have QTextCodec, that's still a reasonable question: should assume it is UTF-8? And should we enforce it? Those were the questions in my OP. > - who did it Considering I don't know a decision *was* made, I don't think we can say who made it. > - what the actual problem to solve was Three things being tackled, all related: 1) QTextCodec in the API I think we cannot do without it, it'll have to stay in one way or another. So the question reduces to whether it should stay in QtCore or be moved to another library. Given the QXmlStreamReader problem above, it's probably best to keep it in QtCore, actually. QTextCodec has some API limitations but they can be fixed. It's not necessary for us to remove it: it's not *that* broken. 2) QtCore size As I said above, removing the legacy codecs we have code for is not a problem. They are already disabled in Qt builds where ICU is present, so we'd additionally remove them from all other builds. Where ICU is present, there's no loss of functionality for user applications, since ICU provides far more codecs than we do. For those without ICU, it stands to reason that the user chose size so they are aware of the limitations. Plus, one can always instantiate their own QTextCodec and add to the list (at least, with today's implementation). If QTextCodec is not in QtCore, then most likely you can't affect how QtCore and almost all other Qt classes decode 8-bit data into QString, including QTextStream. and 3) misconfigured locale systems and filename handling This is probably the biggest problem. As it is right now, when the locale isn't set on a Unix system or if it is explicitly set to C, we *cannot* decode any file names with the 8th bit set. Those file names are considered filesystem corruption. And yet they are quite commonly created by the user outside of English-speaking jurisdictions. Your example of setting LC_ALL (or another environment variable) to force the locale to print something that either can be parsed or shared is one such problematic scenario. On one hand, you may need it to get some older tools to parse output; on the other, it makes Qt applications unable to even see some files exist. > - why LC_*ALL* comes into play Because it's the override. If we decide to override and LC_ALL is set, then we have no choice but to override it. If it is unset, then we can leave it unset too, but may need to override LC_CTYPE. > I get the impression that this thread was not started as an RFC for an > open-ended discussion, but as a staged attempt to provide a figleaf for > a pre-determined decision. That was not the intention. That's why I am re-starting it so we can come back to a reasoned approach. Anyway, the two independent (but related) decisions we need to make are: 1) do we keep QTextCodec in QtCore? 2) do we want to change we handle legacy (non-UTF8) locales? For #2, the sub-questions of the OP apply: a) What should Qt 6 assume the locale to be, if no locale is set? b) In case a non-UTF-8 locale is set, what should we do? c) Should we propagate our decision to child processes? My preferences were: a) C.UTF-8 b) override it to force UTF-8 on the same locale c) yes The reason for my preference in propagating to child processes is so that we have a consistent protocol between parent and child. Moreover, the mechanism for propagating to the child process is the same that prevent other code in the same library from accidentally undoing our override (due to 2.b): qputenv. I don't think that assuming the locale to be UTF-8 without using setlocale() to inform such to the C library is acceptable. It would mean strerror() would produce mojibake for us -- and since QString::fromLocal8Bit doesn't take kindly to mojibake, in most languages qt_error_string() would return empty for any and all error conditions. Just try ENOENT in ja_JP. Going further, I think that if we change "ja_JP" to "ja_JP.UTF-8", we should set it in the environment so that the child processes will produce "そのようなファイルやディレクトリはありません" for ENOENT instead of undecodable mojibake. Turns out, there's one locale that we can be sure that its non-UTF-8 default is decodable under UTF-8 and that'st he "C" locale. So we don't *have* to qputenv "C.UTF-8" if the locale is explicitly "C" (as opposed to being unset). But I think we should. My arguments are that UTF-8 locales are the default in all desktop Linux distributions, all BSDs and on macOS and have been for 15 years. Most embedded systems from the last 5 years at least also have it as the default, especially those with graphical HMIs and most especially those using Qt for that. Any applications that had problems with UTF-8 must have been fixed for a long time and those that didn't are almost certainly launched from wrappers that set a suitable environment for them, either via QProcessEnvironment, execle, a shell script, or some other mechanism. Moreover, setting the locale to non-UTF-8 on a Qt 4 or 5 application on a system with UTF-8-encoded file names is just *wrong* and asking for trouble, for the filesystem reasons stated above. Just as an example, think of an embedded system with a multimedia player that reads a FAT32-formatted USB stick: it wouldn't go very far if it couldn't even see the music files with non-ASCII characters in them. So I feel confident when I say applications targetting porting to Qt 6 are not subject to that problem. Therefore, our resetting of the environment inside the Qt 6 application is not going to affect the chiid processes. But if we disagree and think we shouldn't qputenv, I still think we should assume by default the locale *is* UTF-8, even if the environment tells us it isn't (an explict LANG=ja_JP for example, but much more commonly an LC_ALL=C override). The changing of the encoding is usually an undesired side-effect, not an intentional choice. That is to say, LANG=ja_JP was actually meant to be LANG=ja_JP.UTF-8 and LC_ALL=C could have been for the parsing reasons you brought up. If we don't do the qputenv(), we'll still setlocale() in QCoreApplication so qt_error_string() produces output and we'll live with the danger that some code does our choice. My search through Linux library code found no instance of a permanent setlocale() call with a non-null second parameter (Qt is actually the only exception). I hope this clarifies things and we're back at a rational discussion. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel System Software Products _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development