[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Michael Warner changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=15 ||1160 -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 خالد حسني changed: What|Removed |Added CC||m0rt3z...@gmail.com --- Comment #6 from خالد حسني --- *** Bug 150567 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 خالد حسني changed: What|Removed |Added CC||stsav...@gmail.com --- Comment #5 from خالد حسني --- *** Bug 145758 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Mike Kaganski changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=14 ||5758 -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Mike Kaganski changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=15 ||0567 -- You are receiving this mail because: You are the assignee for the bug.
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Buovjaga changed: What|Removed |Added Ever confirmed|0 |1 CC||ilmari.lauhakangas@libreoff ||ice.org Status|UNCONFIRMED |NEW --- Comment #4 from Buovjaga --- (In reply to Jan-Marek Glogowski from comment #0) > Steps to Reproduce: > 1. Have a unicode / UTF8 file system (that's standard I guess) > 2. Have a file name with non-ASCII characters (łąka.png - 'LC_ALL=C ls -b' > will show the correct UTF8 encoding \305\202\304\205ka.png) > 3. Start LO with LANG=C / LC_ALL=C > 4. Open the file > 5. Export the file > > Actual Results: > 1. The file picker for "gen" shows the wrong file names. kde5 and gtk3 are > fine. > 2. After opening, the window title has the file name with a wrong encoding. > 3. The recent file list has the file name with wrong encoding (which > actually works!) > 4. The save dialog has the wrong default name. I confirm with gen. Step 4 should be "Insert - Image". I don't understand the mention about "after opening", "window title" and "recent file list", because they don't apply to inserted images. Arch Linux 64-bit Version: 7.1.0.0.alpha1+ Build ID: c9b320c32aceab7e22d381b688e7b44030e01c2d CPU threads: 8; OS: Linux 5.9; UI render: default; VCL: x11 Locale: fi-FI (fi_FI.UTF-8); UI: en-US Calc: threaded Built on 8 November 2020 -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Xisco Faulí changed: What|Removed |Added CC||xiscofa...@libreoffice.org Keywords||needsDevAdvice -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 --- Comment #3 from Jan-Marek Glogowski --- Thanks for the input. The gen file picker also works - as expected - if I set LANG=C.UTF-8. I forgot that, thanks for the reminder. (In reply to Stephan Bergmann from comment #1) > (In reply to Jan-Marek Glogowski from comment #0) > > I'm not sure why you qualify this issue with "currently". The behavior > should be as it is ever since OOo. Ok. So I tried to find the definition for the C / POSIX locale and found: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html and http://port70.net/~nsz/c/c89/c89-draft.html What I actually couldn't find is a setting, which has any meaning with regard to the filesystem encoding. If you say file names are just strings, then LC_CTYPE might match, but IMHO that's today a long shot. So now arises the question: should / can we change the C locale to default to UTF-8 for interpreting filenames instead of ISO_8859_1, which is today much more common? I don't know if this actually does match. And I'm also not sure this is feasible in a sensible matter in LO, but somehow Qt and Gtk+ make this assumption and act accordingly. I'm not aware of any other system API, which could be queried for something like that. I tested gimp on Windowmaker with xterm and LANG=C. Still there might be some system setting. -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 --- Comment #2 from Stephan Bergmann --- (In reply to Stephan Bergmann from comment #1) > Arguably, according to my above explanation, the gen file picker shows the > right file name here. With LANG=C, osl_getThreadTextEconding() effectively > is RTL_TEXTENCODING_ISO_8859_1 (though technically it is > RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg". (Above and below, Bugzilla apparently dropped the C1 control characters \U+0082 and \U+0085 from "ÅÄka.jpg", where they should appear after "Å" and after "Ä", respectively.) > The kde5 and gtk3 file pickers presumably use external library code that > doesn't follow LO's convention of interpreting pathnames' byte sequences > according to the system locale, but instead always interpret them as UTF-8. > That would explain why the kde5 file picker dialog shows the file's name as > "łąka.png" instead of "ÅÄka.jpg". But once the kde5 file picker has passed > the URL (which is the same URL as the gen > file picker passes) to LO's internals, LO will again treat that as > representing a pathname whose bytes are interpreted according to > osl_getThreadTextEncoding(). Sorry, the above "which is the same URL as the gen file picker passes" is wrong: With LANG=C, LO interprets that file name as written with the characters \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE \U+0082 \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS \U+0085 \U+006B LATIN SMALL LETTER K ... and "LO internal file URLs" always have their "payload" encoded as UTF-8 (see udkapi/com/sun/star/uri/XExternalUriReferenceTranslator.idl), so the LO internal file URL that the gen file picker returns is . (And when LO wants to access the actual file and converts that URL back to a pathname byte sequence under LANG=C, it first converts from the URL syntax "%C3%85%C2%82%C3%84%C2%85ka.png" to an OUString containing \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE \U+0082 \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS \U+0085 \U+006B LATIN SMALL LETTER K ... code units, and then, because of the osl_getThreadTextEncoding() mandated by LANG=C, to the correct byte sequence "\xC5\x82\xC4\x85ka.png".) -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 --- Comment #1 from Stephan Bergmann --- (In reply to Jan-Marek Glogowski from comment #0) > Description: > This is the "extension" of bug 125971. > > Something in the local file URL handling is currently broken when you use > the C locale, at least on all unix backends. I can't test MacOS and Windows, > but since I suspect an error in the URL handling with regard to the current > locale setting, at least MacOS might be affected too. Has Windows some > equivalent of C locale? I'm not sure why you qualify this issue with "currently". The behavior should be as it is ever since OOo. A traditional Unix (incl. Linux) file name is just a sequence of bytes, without a means specifying in what encoding to interpret those bytes. Ever since OOo was made Unicode-aware, it wanted to represent pathnames internally as Unicode (UTF-16) strings (whether or not that was a good decision, but it's consequences permeate the code base and it would probably be hard to change it now). It adopted the convention of translating between a pathname's bytes and the internal OUString according to the system locale that OOo is run with (i.e., LANG/LC_ALL; see osl_getThreadTextEncoding). (That of course means that there can be problems, e.g. when a pathname consists of a sequence of bytes that is not valid according to osl_getThreadTextEncoding(), or when some internal OUString shall be translated to a pathname's sequence of bytes, but contains Unicode letters that cannot be mapped to osl_getThreadTextEncoding(). OOo/LO have always been prone to such problems. In practice, their impact is reduced by people using a single, consistent system locale (text encoding) to name their files and to run LO, and by many people exclusively using UTF-8 locales anyway these days.) > Steps to Reproduce: > 1. Have a unicode / UTF8 file system (that's standard I guess) Traditional Unix (incl. Linux) file systems do not have an encoding, see above. > 2. Have a file name with non-ASCII characters (łąka.png - 'LC_ALL=C ls -b' > will show the correct UTF8 encoding \305\202\304\205ka.png) > 3. Start LO with LANG=C / LC_ALL=C This is the "user mistake". To operate well with files whose names are encoded in UTF-8, LO should be run with a UTF-8 locale. Otherwise, problems are expected to occur (see above). > 4. Open the file > 5. Export the file > > Actual Results: > 1. The file picker for "gen" shows the wrong file names. kde5 and gtk3 are > fine. Arguably, according to my above explanation, the gen file picker shows the right file name here. With LANG=C, osl_getThreadTextEconding() effectively is RTL_TEXTENCODING_ISO_8859_1 (though technically it is RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg". The kde5 and gtk3 file pickers presumably use external library code that doesn't follow LO's convention of interpreting pathnames' byte sequences according to the system locale, but instead always interpret them as UTF-8. That would explain why the kde5 file picker dialog shows the file's name as "łąka.png" instead of "ÅÄka.jpg". But once the kde5 file picker has passed the URL (which is the same URL as the gen file picker passes) to LO's internals, LO will again treat that as representing a pathname whose bytes are interpreted according to osl_getThreadTextEncoding(). > 2. After opening, the window title has the file name with a wrong encoding. Again, it is the right encoding according to the above. > 3. The recent file list has the file name with wrong encoding (which > actually works!) ditto... -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling
https://bugs.documentfoundation.org/show_bug.cgi?id=125995 Jan-Marek Glogowski changed: What|Removed |Added See Also||https://bugs.documentfounda ||tion.org/show_bug.cgi?id=12 ||5971 CC||sberg...@redhat.com Blocks||102495, 32500 Referenced Bugs: https://bugs.documentfoundation.org/show_bug.cgi?id=32500 [Bug 32500] [META] GTK style doesn't draw some elements via GTK https://bugs.documentfoundation.org/show_bug.cgi?id=102495 [Bug 102495] [META] KDE VCL backend bugs and enhancements -- You are receiving this mail because: You are the assignee for the bug.___ Libreoffice-bugs mailing list Libreoffice-bugs@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs