Re: string encoding

Hong Zhang Fri, 16 Feb 2001 16:47:06 -0800
> > I think you already mixed the codepoint vc character. What you will get
is
> > 10th codepoint, not 10th character.
>
> I think you're confused. Codepoints *are* characters. Combining characters
are
> taken care of as per the RFC.

If you define that way, I can agree with it. Since you still have to handle
combining character in different place, you will not save much overall.

> I'm talking about UTF16. You're talking about UTF32.
> Try talking about what I'm talking about.

With UTF-16, you have to handle surrogate, right? It is still variable
length
encoding. At this time, the surrogate is undefined. In case it is widely
used,
the nightmare will come back.

> > I said it is not common case
>
> And I am saying that it is.
>
> I have been through this many, many times. I am not going through it
> again.

What I can see is that you argue the random access is important and
and nice to have. But I don't see it is common case. Can you name
some practical text algorithms or usages in Perl? I think Perl is
not the language that is designed for character by character text
process. As long as regexp is faster enough, most people will be
happy.

Hong
Re: string encoding

Reply via email to