Re: wide-char is wide

Hans Aberg Wed, 25 Mar 2009 14:00:30 -0700


On 25 Mar 2009, at 17:55, Francisco Vila wrote:

I am now confused because Trevor has said that the hex value is a
variable length coding value for the Unicode entity, therefore this
hex number has to follow the utf-8 rules, not utf-32 which is always a
32bit fixed-length value.

...

... after Trevor I now think the hex value _is_ utf-8
coded. I might be completely wrong.


You might search this page for "code point":
  http://en.wikipedia.org/wiki/Unicode

It just a natural number assign to each abstract character it defines.The section

  http://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology

describes the convention of writing these numbers with the prefix "U+": numbers below 2^16 are written with four hex digit, and other withfive or six as is needed.

Then, in order to get it into a computer, one uses an encoding thattranslates these numbers into byte sequences. Among these are UTF-8,UTF-16 and UTF-32. The last, UTF-32 ought to be simplest, because itjust takes the code point in binary number base, but since one doesnot agree on how to sort out the order of bytes in a computer, thereare two: UTF-32BE (big endian, used by PowerPC) and UTF-32LE (littleendian, used by IntelPC). Similarly for UTF-16, which was invented inthe days when one thought 16 would be enough for all Unicode, butlater extended in an irregular way.

UTF-8 does not have this endianness problem, as mostly one todaymostly agrees on how to sort out the bits in a byte. It was inventedfor use on UNIX computers. It is constructed so that bytes withhighest bit 0 have the same value in ASCI, and all other charactershave highest set to 1 and are multibyte. It was adopted by Unicode,which imposed a limit on the number of characters. So strictlyspeaking, there are two UTF-8.


  Hans Aberg




_______________________________________________
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond

Re: wide-char is wide

Reply via email to