Strings internals

Dan Sugalski Wed, 16 Jun 2004 09:01:04 -0700

Okay, now that we've got the bytecode-visible stuff specified, I want to spec the internals some, and start getting things migrated over to it. (This should allow us to make ICU optional as well, for folks that only want ASCII/Latin-x/EBCDIC enabled)

Once again, we're going with vtables, like the strings originally had. Each string has two vtable pointers, one for encoding and one for charset. Encodings and charsets'll be loadable libraries, in the encodings/ and charset/ directories.

The encoding vtable needs to handle get/set codepoint, get/set byte, and transform to another encoding. I don't think there's anything else, but I could be wrong there.

The charset vtable needs to handle get/set grapheme, get/set substring, up/down/titlecase, and (possibly) comparison. Charsets also have a separate grapheme classification requirement (for regexes) but we'll put that off for now.

I think those are it, but before we nail them down I'd like to have folks squint at this a bit so we can make sure it's right. When it is we can define the API directly and start implementing it. -- Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Strings internals

Reply via email to