On 24/06/2009, at 1:57 AM, Paul Davis wrote:

Are there byte order semantics for UTF-8?

No, UTF-8 is independent of byte ordering because it's a byte stream.

Or other cases where sorting
by UTF-8 binary representation is going to cause issues? Remember that
the end goal is to create deterministic serializations for hashing.

Sorting over the UTF-8 bytes is fine for this.

Sorting by code point doesn't seem like it'd get us anything other
than added complexity.

Agreed, because you would have to deal with surrogates in UTF-16.

Patches welcome.

Of course. Let me qualify by saying I have no time to do this, I'm merely taking part in a discussion.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

When I hear somebody sigh, 'Life is hard,' I am always tempted to ask, 'Compared to what?'
  -- Sydney Harris


Reply via email to