At 12:57 PM +0200 8/21/03, Peter Gibbs wrote:
If the string API is to be revised, I would like to suggest that
consideration be given to having a single string vtable, merging
the current encoding and chartype structures into a single one.

I think this has been addressed, but in case it hasn't... while I'd love to have a single unified scheme, there is a NxM problem here. There are a relatively few types of bytestream<->character mappings, which is what the encoding handles. 8, 16, and 32 bit integers, UTF8, UTF16 (which is variable width), Big5, and shift-JIS spring to mind. There are also a *lot* of different character sets for at least some of those encodings. (There are many different 8-bit character sets, a few variants of Big 5, and IIRC a couple of different ways to use the UTF-8 and -16 encodings)


Splitting them up means less work for whoever's writing the character set translations, as well as for whoever writes the encoding translations. I can, for example, write a Big5->16-bit fixed width transform, but I'd be hard pressed to define the different character set elements for it. We can reasonably easily provide a set of encodings as part of parrot, and leave the actual character set stuff for most sets to third-party folks, which strikes me as the way to go.

Yeah, it does mean one more pointer per string, which isn't great, but it means fewer tables--when a string changes encodings but not character sets we don't have to have a whole new table that mushes together the string and encoding.

It may turn out that we have relatively few encoding/set variants, in which case that decision can be revisited and we can go with an implementation that presents two tables conceptually but a single unified table for the implementation.
--
Dan


--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to