Re: A question about com.sun.star.frame.XStorable's URL

2017-01-23 Thread Takeshi Abe
Hi Stephan,

Thanks a lot for your reply.

On Mon, 23 Jan 2017 10:26:09 +0100, Stephan Bergmann  
wrote:
> On 01/20/2017 03:25 AM, Takeshi Abe wrote:
>> Preparing a patch for tdf#105382 [1], I come across a question about
>> character encoding for the path part of a URL representing a
>> com.sun.star.frame.XStorable's location.
>> I wonder if the original (before percent-encoded) path of such a URL can
>> be in an encoding other than UTF-8 or even in a different charset due
>> to e.g. a code page of some legacy filesystems.
>> Is it possible?
>> And, if so, is there any reasonable way to tell the encoding?
> 
> A conforming URL itself, by definition, is written with a subset of ASCII-only
> characters.
> 
> For file URLs, there never was a definition how to interpret the octets 
> encoded
> in the URL's path component, so OOo/LO came up with the convention of always
> interpreting those as UTF-8.  (So any code that converts between file URLs and
> native pathnames needs to do that mapping between UTF-8 and the relevant 
> native
> pathname encoding, which LO assumes to be as reported by
> osl_getThreadTextEncoding.)
Got it. What should be done for tdf#105382 becomes clear now.

IIUC the basic strategy to encode a file URL for UNO is the same as a current
standard [1] describing in section "2.5. Identifying Data":
> (...) A
> system that internally provides identifiers in the form of a
> different character encoding, such as EBCDIC, will generally perform
> character translation of textual identifiers to UTF-8 [STD63] (or
> some other superset of the US-ASCII character encoding) at an
> internal interface, thereby providing more meaningful identifiers
> than those resulting from simply percent-encoding the original
> octets.

[1] https://tools.ietf.org/html/rfc3986

Cheers,
-- Takeshi Abe
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: A question about com.sun.star.frame.XStorable's URL

2017-01-23 Thread Takeshi Abe
Hi Miklos,

Thank you for your answer.

On Mon, 23 Jan 2017 10:10:45 +0100, Miklos Vajna  
wrote:
> On Fri, Jan 20, 2017 at 11:25:00AM +0900, Takeshi Abe  
> wrote:
>> Preparing a patch for tdf#105382 [1], I come across a question about
>> character encoding for the path part of a URL representing a
>> com.sun.star.frame.XStorable's location.
>> I wonder if the original (before percent-encoded) path of such a URL can
>> be in an encoding other than UTF-8 or even in a different charset due
>> to e.g. a code page of some legacy filesystems.
>> Is it possible?
>> And, if so, is there any reasonable way to tell the encoding?
> 
> The UNO API works with UNOIDL strings, where those strings are
> represented as OUStrings in C++, which is an array of Unicode
> characters.
> 
> I think this means you have to decide encoding when you convert your
> OString (or other byte array) data to OUString, before calling any UNO
> API.
OK, that confirms my understanding about UNO API. But it still sounds
neutral whether the original path can be in a foreign encoding,
as the percent-encoded one can contain only ASCII chars (so may be passed
to UNO interface transparently.)

Cheers,
-- Takeshi Abe
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: A question about com.sun.star.frame.XStorable's URL

2017-01-23 Thread Stephan Bergmann

On 01/20/2017 03:25 AM, Takeshi Abe wrote:

Preparing a patch for tdf#105382 [1], I come across a question about
character encoding for the path part of a URL representing a
com.sun.star.frame.XStorable's location.
I wonder if the original (before percent-encoded) path of such a URL can
be in an encoding other than UTF-8 or even in a different charset due
to e.g. a code page of some legacy filesystems.
Is it possible?
And, if so, is there any reasonable way to tell the encoding?


A conforming URL itself, by definition, is written with a subset of 
ASCII-only characters.


For file URLs, there never was a definition how to interpret the octets 
encoded in the URL's path component, so OOo/LO came up with the 
convention of always interpreting those as UTF-8.  (So any code that 
converts between file URLs and native pathnames needs to do that mapping 
between UTF-8 and the relevant native pathname encoding, which LO 
assumes to be as reported by osl_getThreadTextEncoding.)


___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: A question about com.sun.star.frame.XStorable's URL

2017-01-23 Thread Miklos Vajna
Hi,

On Fri, Jan 20, 2017 at 11:25:00AM +0900, Takeshi Abe  
wrote:
> Preparing a patch for tdf#105382 [1], I come across a question about
> character encoding for the path part of a URL representing a
> com.sun.star.frame.XStorable's location.
> I wonder if the original (before percent-encoded) path of such a URL can
> be in an encoding other than UTF-8 or even in a different charset due
> to e.g. a code page of some legacy filesystems.
> Is it possible?
> And, if so, is there any reasonable way to tell the encoding?

The UNO API works with UNOIDL strings, where those strings are
represented as OUStrings in C++, which is an array of Unicode
characters.

I think this means you have to decide encoding when you convert your
OString (or other byte array) data to OUString, before calling any UNO
API.

Regards,

Miklos


signature.asc
Description: Digital signature
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice


A question about com.sun.star.frame.XStorable's URL

2017-01-20 Thread Takeshi Abe
Hi,

Preparing a patch for tdf#105382 [1], I come across a question about
character encoding for the path part of a URL representing a
com.sun.star.frame.XStorable's location.
I wonder if the original (before percent-encoded) path of such a URL can
be in an encoding other than UTF-8 or even in a different charset due
to e.g. a code page of some legacy filesystems.
Is it possible?
And, if so, is there any reasonable way to tell the encoding?

[1] https://gerrit.libreoffice.org/#/c/33261/

Cheers,
-- Takeshi Abe
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice