Simon Cozens wrote:
> 
>...
> 
> ... That's to say, if I want to get the length of
> the string in characters, I call string_length(STRING* foo) which looks
> at foo's encoding and calls the appropriate length-getting function for
> that encoding.

At some point you have two strings and the engine is asked to
concatenate them and it can't ask either of them to do the job itself.
It needs semantics for the concatenation and the only standards that
have tried to deal with this are Unicode and ISO 2022 which seems dead.

Another example is when one string is a regular expression (in one
encoding) and the other is a string to match against (in another
encoding).

> So while the interpreter doesn't have to care about string encodings, at
> some point this has to bottom out and you have to get down and implement
> encoding-aware functions.

If the interpreter has a built-in concept of regular expression or
string concatenation (rather than dispatching these to the types) then
it needs to have a built-in understanding of the semantics of encoding
combination. I don't think you can define that without "standardizing"
on Unicode or some other the unifying character set.

 Paul Prescod

Reply via email to