Once again, we're going with vtables, like the strings originally had. Each string has two vtable pointers, one for encoding and one for charset. Encodings and charsets'll be loadable libraries, in the encodings/ and charset/ directories.
The encoding vtable needs to handle get/set codepoint, get/set byte, and transform to another encoding. I don't think there's anything else, but I could be wrong there.
The charset vtable needs to handle get/set grapheme, get/set substring, up/down/titlecase, and (possibly) comparison. Charsets also have a separate grapheme classification requirement (for regexes) but we'll put that off for now.
I think those are it, but before we nail them down I'd like to have folks squint at this a bit so we can make sure it's right. When it is we can define the API directly and start implementing it.
--
Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk