David Maus <dm...@ictsoc.de> writes:
> Sébastien Vauban wrote:
>>Hello,
>
>>With current git pull, and such an Org file (in UTF-8 encoding):
>
>> ...
>
>>I get the following error when trying to export it via PDFLaTeX:
>
> The problem is, that the 'É' character is not in Org's default list
> for link escapes but `string-match' matches for the lower case
> character.  Adding more chars to `org-link-escape-chars' would solve
> the problem, but this seems to be a broder issue:
>
> Regular links (URIs) are restricted to a special set of ASCII
> characters and non-ascii chars are hex-encoded.  Currently Org escapes
> links to Org mode headlines using the table mentioned above.  But Org
> files and hence Org headlines might be Unicode, containing multibyte
> characters that cannot be hex-escaped in the normal fashion.
>
> Maybe something like this would be a solution:
>
>  - Org only escapes square brackets when escaping a link to an Org
>    mode headline
>  - `org-link-escape' uses a shotgun-approach: Every char that is not
>    allowed according to the specs (Cf. RFC3986) is percent encoded if
>    the link sequence does not contain multibyte chars; If the sequence
>    does contain multibyte chars, `org-link-escape' produces an IRI
>    (Cf. RFC3987).



Is there a reason for this distinction between multibyte and unibyte?
I favour the "shotgun-approach" if not.  It's bullet-proof.



The JavaScript function `encodeURIComponent()' encodes the German Umlaut
`ü' as `%C3%B6' regardless of the sources encoding actually.  That's why
I wrote the two functions `org-protocol-unhex-string' and
`org-protocol-unhex-compound' (s. org-protocol.el).


I'll have to take a look at that RFC you mentioned :)



Best wishes

  Sebastian

_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode

Reply via email to