Re: A question about com.sun.star.frame.XStorable's URL
Hi Stephan, Thanks a lot for your reply. On Mon, 23 Jan 2017 10:26:09 +0100, Stephan Bergmann wrote: > On 01/20/2017 03:25 AM, Takeshi Abe wrote: >> Preparing a patch for tdf#105382 [1], I come across a question about >> character encoding for the path part of a URL representing a >> com.sun.star.frame.XStorable's location. >> I wonder if the original (before percent-encoded) path of such a URL can >> be in an encoding other than UTF-8 or even in a different charset due >> to e.g. a code page of some legacy filesystems. >> Is it possible? >> And, if so, is there any reasonable way to tell the encoding? > > A conforming URL itself, by definition, is written with a subset of ASCII-only > characters. > > For file URLs, there never was a definition how to interpret the octets > encoded > in the URL's path component, so OOo/LO came up with the convention of always > interpreting those as UTF-8. (So any code that converts between file URLs and > native pathnames needs to do that mapping between UTF-8 and the relevant > native > pathname encoding, which LO assumes to be as reported by > osl_getThreadTextEncoding.) Got it. What should be done for tdf#105382 becomes clear now. IIUC the basic strategy to encode a file URL for UNO is the same as a current standard [1] describing in section "2.5. Identifying Data": > (...) A > system that internally provides identifiers in the form of a > different character encoding, such as EBCDIC, will generally perform > character translation of textual identifiers to UTF-8 [STD63] (or > some other superset of the US-ASCII character encoding) at an > internal interface, thereby providing more meaningful identifiers > than those resulting from simply percent-encoding the original > octets. [1] https://tools.ietf.org/html/rfc3986 Cheers, -- Takeshi Abe ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: A question about com.sun.star.frame.XStorable's URL
Hi Miklos, Thank you for your answer. On Mon, 23 Jan 2017 10:10:45 +0100, Miklos Vajna wrote: > On Fri, Jan 20, 2017 at 11:25:00AM +0900, Takeshi Abe > wrote: >> Preparing a patch for tdf#105382 [1], I come across a question about >> character encoding for the path part of a URL representing a >> com.sun.star.frame.XStorable's location. >> I wonder if the original (before percent-encoded) path of such a URL can >> be in an encoding other than UTF-8 or even in a different charset due >> to e.g. a code page of some legacy filesystems. >> Is it possible? >> And, if so, is there any reasonable way to tell the encoding? > > The UNO API works with UNOIDL strings, where those strings are > represented as OUStrings in C++, which is an array of Unicode > characters. > > I think this means you have to decide encoding when you convert your > OString (or other byte array) data to OUString, before calling any UNO > API. OK, that confirms my understanding about UNO API. But it still sounds neutral whether the original path can be in a foreign encoding, as the percent-encoded one can contain only ASCII chars (so may be passed to UNO interface transparently.) Cheers, -- Takeshi Abe ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: A question about com.sun.star.frame.XStorable's URL
On 01/20/2017 03:25 AM, Takeshi Abe wrote: Preparing a patch for tdf#105382 [1], I come across a question about character encoding for the path part of a URL representing a com.sun.star.frame.XStorable's location. I wonder if the original (before percent-encoded) path of such a URL can be in an encoding other than UTF-8 or even in a different charset due to e.g. a code page of some legacy filesystems. Is it possible? And, if so, is there any reasonable way to tell the encoding? A conforming URL itself, by definition, is written with a subset of ASCII-only characters. For file URLs, there never was a definition how to interpret the octets encoded in the URL's path component, so OOo/LO came up with the convention of always interpreting those as UTF-8. (So any code that converts between file URLs and native pathnames needs to do that mapping between UTF-8 and the relevant native pathname encoding, which LO assumes to be as reported by osl_getThreadTextEncoding.) ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: A question about com.sun.star.frame.XStorable's URL
Hi, On Fri, Jan 20, 2017 at 11:25:00AM +0900, Takeshi Abe wrote: > Preparing a patch for tdf#105382 [1], I come across a question about > character encoding for the path part of a URL representing a > com.sun.star.frame.XStorable's location. > I wonder if the original (before percent-encoded) path of such a URL can > be in an encoding other than UTF-8 or even in a different charset due > to e.g. a code page of some legacy filesystems. > Is it possible? > And, if so, is there any reasonable way to tell the encoding? The UNO API works with UNOIDL strings, where those strings are represented as OUStrings in C++, which is an array of Unicode characters. I think this means you have to decide encoding when you convert your OString (or other byte array) data to OUString, before calling any UNO API. Regards, Miklos signature.asc Description: Digital signature ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice
A question about com.sun.star.frame.XStorable's URL
Hi, Preparing a patch for tdf#105382 [1], I come across a question about character encoding for the path part of a URL representing a com.sun.star.frame.XStorable's location. I wonder if the original (before percent-encoded) path of such a URL can be in an encoding other than UTF-8 or even in a different charset due to e.g. a code page of some legacy filesystems. Is it possible? And, if so, is there any reasonable way to tell the encoding? [1] https://gerrit.libreoffice.org/#/c/33261/ Cheers, -- Takeshi Abe ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice