If the string API is to be revised, I would like to suggest that consideration be given to having a single string vtable, merging the current encoding and chartype structures into a single one.
I think this has been addressed, but in case it hasn't... while I'd love to have a single unified scheme, there is a NxM problem here. There are a relatively few types of bytestream<->character mappings, which is what the encoding handles. 8, 16, and 32 bit integers, UTF8, UTF16 (which is variable width), Big5, and shift-JIS spring to mind. There are also a *lot* of different character sets for at least some of those encodings. (There are many different 8-bit character sets, a few variants of Big 5, and IIRC a couple of different ways to use the UTF-8 and -16 encodings)
Splitting them up means less work for whoever's writing the character set translations, as well as for whoever writes the encoding translations. I can, for example, write a Big5->16-bit fixed width transform, but I'd be hard pressed to define the different character set elements for it. We can reasonably easily provide a set of encodings as part of parrot, and leave the actual character set stuff for most sets to third-party folks, which strikes me as the way to go.
Yeah, it does mean one more pointer per string, which isn't great, but it means fewer tables--when a string changes encodings but not character sets we don't have to have a whole new table that mushes together the string and encoding.
It may turn out that we have relatively few encoding/set variants, in which case that decision can be revisited and we can go with an implementation that presents two tables conceptually but a single unified table for the implementation.
--
Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk