Hi Andreas,

I think what you are observing is the following: 

UTF-8 encoded stings can optionally denote a multi-byte sequence, with the 
number of leading 1-s defining the multibyte pattern length.
c.f. https://en.wikipedia.org/wiki/UTF-8 <https://en.wikipedia.org/wiki/UTF-8>

In your example, c3 decodes to: 
> xs:hexBinary("c3")
> => convert:binary-to-integers()
> => for-each(convert:integer-to-base(?,2))
> (: returns: 11000011 :)
And the two leading 1s will tell the UTF-8 decoder to read a second byte — 
which is missing — hence decoding fails with an error or if you use the 
fallback-option it will return a �

While decoding ASCII, where only 127 bits are used, this is no problem as UTF-8 
shares the same character positions with the ascii table.

Your „C3“ character however is not in ascii but most probably ISO-8859-1 or 
CP1252? So while a glance at https://tools.ietf.org/html/rfc3986 
<https://tools.ietf.org/html/rfc3986> says URI Characters should be encoded in 
UTF-8 in practice 
chances are you encounter values that are encoded using some „local“ encoding.

If your string is not UTF-8 encoded you may only guess what the correct 
encoding is.

You may send a predefined string that is known to be of two bytes length in 
UTF-8, such as: ä that will be either converted to „%C3%A4“  if it is unicode 
or to a well known single byte such as for example „E4“ in ISO-8859-1.
Depending and what you receive by your client for that given string you may 
assume it encodes its data either utf-8 or latin1. 

You can check what your string would be encoded to: 
> string(convert:string-to-hex('ä',"latin1")) 


Sorry for the long mail, hope the explanation is useful for you, even though 
the solution is not sooo simple and involves guessing :-)

Best
Michael 





> Am 09.06.2019 um 17:09 schrieb Andreas Mixich <mixich.andr...@gmail.com>:
> 
> How can I simply get back any character, readable by a human, from a 
> hexadecimal value?
> 

Reply via email to