[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2022-09-24 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Michael Warner  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=15
   ||1160

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2022-08-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

خالد حسني  changed:

   What|Removed |Added

 CC||m0rt3z...@gmail.com

--- Comment #6 from خالد حسني  ---
*** Bug 150567 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2022-08-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

خالد حسني  changed:

   What|Removed |Added

 CC||stsav...@gmail.com

--- Comment #5 from خالد حسني  ---
*** Bug 145758 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2022-08-26 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Mike Kaganski  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=14
   ||5758

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2022-08-25 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Mike Kaganski  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=15
   ||0567

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2020-11-10 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Buovjaga  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||ilmari.lauhakangas@libreoff
   ||ice.org
 Status|UNCONFIRMED |NEW

--- Comment #4 from Buovjaga  ---
(In reply to Jan-Marek Glogowski from comment #0)
> Steps to Reproduce:
> 1. Have a unicode / UTF8 file system (that's standard I guess)
> 2. Have a file name with non-ASCII characters (łąka.png - 'LC_ALL=C ls -b'
> will show the correct UTF8 encoding \305\202\304\205ka.png)
> 3. Start LO with LANG=C / LC_ALL=C
> 4. Open the file
> 5. Export the file
> 
> Actual Results:
> 1. The file picker for "gen" shows the wrong file names. kde5 and gtk3 are
> fine.
> 2. After opening, the window title has the file name with a wrong encoding.
> 3. The recent file list has the file name with wrong encoding (which
> actually works!)
> 4. The save dialog has the wrong default name.

I confirm with gen. Step 4 should be "Insert - Image". I don't understand the
mention about "after opening", "window title" and "recent file list", because
they don't apply to inserted images.

Arch Linux 64-bit
Version: 7.1.0.0.alpha1+
Build ID: c9b320c32aceab7e22d381b688e7b44030e01c2d
CPU threads: 8; OS: Linux 5.9; UI render: default; VCL: x11
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 8 November 2020

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs


[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2019-10-22 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Xisco Faulí  changed:

   What|Removed |Added

 CC||xiscofa...@libreoffice.org
   Keywords||needsDevAdvice

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2019-06-19 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

--- Comment #3 from Jan-Marek Glogowski  ---
Thanks for the input. The gen file picker also works - as expected - if I set
LANG=C.UTF-8. I forgot that, thanks for the reminder.

(In reply to Stephan Bergmann from comment #1)
> (In reply to Jan-Marek Glogowski from comment #0)
> 
> I'm not sure why you qualify this issue with "currently".  The behavior
> should be as it is ever since OOo.

Ok.

So I tried to find the definition for the C / POSIX locale and found:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html and
http://port70.net/~nsz/c/c89/c89-draft.html

What I actually couldn't find is a setting, which has any meaning with regard
to the filesystem encoding. If you say file names are just strings, then
LC_CTYPE might match, but IMHO that's today a long shot.

So now arises the question: should / can we change the C locale to default to
UTF-8 for interpreting filenames instead of ISO_8859_1, which is today much
more common?
I don't know if this actually does match. And I'm also not sure this is
feasible in a sensible matter in LO, but somehow Qt and Gtk+ make this
assumption and act accordingly.

I'm not aware of any other system API, which could be queried for something
like that. I tested gimp on Windowmaker with xterm and LANG=C. Still there
might be some system setting.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2019-06-19 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

--- Comment #2 from Stephan Bergmann  ---
(In reply to Stephan Bergmann from comment #1)
> Arguably, according to my above explanation, the gen file picker shows the
> right file name here.  With LANG=C, osl_getThreadTextEconding() effectively
> is RTL_TEXTENCODING_ISO_8859_1 (though technically it is
> RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".

(Above and below, Bugzilla apparently dropped the C1 control characters \U+0082
and \U+0085 from "ÅÄka.jpg", where they should appear after "Å" and after "Ä",
respectively.)

> The kde5 and gtk3 file pickers presumably use external library code that
> doesn't follow LO's convention of interpreting pathnames' byte sequences
> according to the system locale, but instead always interpret them as UTF-8. 
> That would explain why the kde5 file picker dialog shows the file's name as
> "łąka.png" instead of "ÅÄka.jpg".  But once the kde5 file picker has passed
> the  URL (which is the same URL as the gen
> file picker passes) to LO's internals, LO will again treat that as
> representing a pathname whose bytes are interpreted according to
> osl_getThreadTextEncoding().

Sorry, the above "which is the same URL as the gen file picker passes" is
wrong:  With LANG=C, LO interprets that file name as written with the
characters

  \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
  \U+0082 
  \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
  \U+0085 
  \U+006B LATIN SMALL LETTER K
  ...

and "LO internal file URLs" always have their "payload" encoded as UTF-8 (see
udkapi/com/sun/star/uri/XExternalUriReferenceTranslator.idl), so the LO
internal file URL that the gen file picker returns is
.  (And when LO wants to access the
actual file and converts that URL back to a pathname byte sequence under
LANG=C, it first converts from the URL syntax "%C3%85%C2%82%C3%84%C2%85ka.png"
to an OUString containing

  \U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
  \U+0082 
  \U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS
  \U+0085 
  \U+006B LATIN SMALL LETTER K
  ...

code units, and then, because of the osl_getThreadTextEncoding() mandated by
LANG=C, to the correct byte sequence "\xC5\x82\xC4\x85ka.png".)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2019-06-19 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

--- Comment #1 from Stephan Bergmann  ---
(In reply to Jan-Marek Glogowski from comment #0)
> Description:
> This is the "extension" of bug 125971.
> 
> Something in the local file URL handling is currently broken when you use
> the C locale, at least on all unix backends. I can't test MacOS and Windows,
> but since I suspect an error in the URL handling with regard to the current
> locale setting, at least MacOS might be affected too. Has Windows some
> equivalent of C locale?

I'm not sure why you qualify this issue with "currently".  The behavior should
be as it is ever since OOo.

A traditional Unix (incl. Linux) file name is just a sequence of bytes, without
a means specifying in what encoding to interpret those bytes.  Ever since OOo
was made Unicode-aware, it wanted to represent pathnames internally as Unicode
(UTF-16) strings (whether or not that was a good decision, but it's
consequences permeate the code base and it would probably be hard to change it
now).  It adopted the convention of translating between a pathname's bytes and
the internal OUString according to the system locale that OOo is run with
(i.e., LANG/LC_ALL; see osl_getThreadTextEncoding).  (That of course means that
there can be problems, e.g. when a pathname consists of a sequence of bytes
that is not valid according to osl_getThreadTextEncoding(), or when some
internal OUString shall be translated to a pathname's sequence of bytes, but
contains Unicode letters that cannot be mapped to osl_getThreadTextEncoding(). 
OOo/LO have always been prone to such problems.  In practice, their impact is
reduced by people using a single, consistent system locale (text encoding) to
name their files and to run LO, and by many people exclusively using UTF-8
locales anyway these days.)

> Steps to Reproduce:
> 1. Have a unicode / UTF8 file system (that's standard I guess)

Traditional Unix (incl. Linux) file systems do not have an encoding, see above.

> 2. Have a file name with non-ASCII characters (łąka.png - 'LC_ALL=C ls -b'
> will show the correct UTF8 encoding \305\202\304\205ka.png)
> 3. Start LO with LANG=C / LC_ALL=C

This is the "user mistake".  To operate well with files whose names are encoded
in UTF-8, LO should be run with a UTF-8 locale.  Otherwise, problems are
expected to occur (see above).

> 4. Open the file
> 5. Export the file
> 
> Actual Results:
> 1. The file picker for "gen" shows the wrong file names. kde5 and gtk3 are
> fine.

Arguably, according to my above explanation, the gen file picker shows the
right file name here.  With LANG=C, osl_getThreadTextEconding() effectively is
RTL_TEXTENCODING_ISO_8859_1 (though technically it is
RTL_TEXTENCODING_ASCII_US), so you get "ÅÄka.jpg".

The kde5 and gtk3 file pickers presumably use external library code that
doesn't follow LO's convention of interpreting pathnames' byte sequences
according to the system locale, but instead always interpret them as UTF-8. 
That would explain why the kde5 file picker dialog shows the file's name as
"łąka.png" instead of "ÅÄka.jpg".  But once the kde5 file picker has passed
the  URL (which is the same URL as the gen file
picker passes) to LO's internals, LO will again treat that as representing a
pathname whose bytes are interpreted according to osl_getThreadTextEncoding().

> 2. After opening, the window title has the file name with a wrong encoding.

Again, it is the right encoding according to the above.

> 3. The recent file list has the file name with wrong encoding (which
> actually works!)

ditto...

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 125995] C locale is currently broken for file handling

2019-06-18 Thread bugzilla-daemon
https://bugs.documentfoundation.org/show_bug.cgi?id=125995

Jan-Marek Glogowski  changed:

   What|Removed |Added

   See Also||https://bugs.documentfounda
   ||tion.org/show_bug.cgi?id=12
   ||5971
 CC||sberg...@redhat.com
 Blocks||102495, 32500


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=32500
[Bug 32500] [META] GTK style doesn't draw some elements via GTK
https://bugs.documentfoundation.org/show_bug.cgi?id=102495
[Bug 102495] [META] KDE VCL backend bugs and enhancements
-- 
You are receiving this mail because:
You are the assignee for the bug.___
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs