pspad:
--------------------------------------------------------------------------------
This isn't problem. But in case when is it surrogate pair, I need to read 2
bytes and calculate real char code. How can I know if it is surrogate pair?
Is it rule that surrogate pair starts always with D8xx followed by DCxx..DFxx?
If is it rule, I can take 2 bytes and calculate char value.
--------------------------------------------------------------------------------


I believe, it is a rule for a valid utf-16 encoding, see:
http://unicode.org/faq/utf_bom.html#utf16-2

"
Q: What are surrogates?

A: Surrogates are code points from two special ranges of Unicode values,
reserved for use as the leading, and trailing values of paired code units in
UTF-16. Leading, also called high, surrogates are from D800 /16 to DBFF /16, and
trailing, or low, surrogates are from DC00 /16 to DFFF /16. They are called
surrogates, since they do not represent characters directly, but only as a
pair.
"

In the next chapter of that page, there is another info about the conversion,
but it is probably the same like the previously linked page.

(It is recommended in the unicode standard to treat invalid use of surrogate as
errors, but in a text ditor I would rather prefer to keep them and report as
individual characters.)

vbr

-- 
<http://forum.pspad.com/read.php?2,64696,64705>
PSPad freeware editor http://www.pspad.com

Odpovedet emailem