If the string API is to be revised, I would like to suggest that consideration be given to having a single string vtable, merging the current encoding and chartype structures into a single one.
This removes one pointer from each string header, and allows a single parameter to be used instead of two for transcode, etc. Also, the IO system will need to have a mechanism for specifying the character set used by files; this again could then be a single value. I do not believe that the two existing parameters are orthogonal, so the number of charset (or whatever) entities would be less than the cross product. e.g. the existing 2 chartypes x 4 encodings would really only require 4 charsets. I actually implemented this change some time ago as part of my 'African Grey' variant; an extract from my charset.h appears below. The get_unicode and put_unicode entries combine the get and put operations with transcoding; this simplifies the transcode operation significantly. find_substring was an experimental feature that simply replaced the two calls to skip_forward used by string_substr; it also implemented the optimisation for single-byte encodings that has subsequently been catered for by specific code in string_substr. ---------------------------------------------------------------------------- -------------- enum { enum_charset_usascii, enum_charset_utf8, enum_charset_utf16, enum_charset_utf32, enum_charset_MAX }; struct parrot_charset_t { INTVAL index; const char *name; Parrot_UInt max_bytes; Parrot_UInt(*length) (const void *ptr, Parrot_UInt bytes); const void *(*skip_forward) (const void *ptr, Parrot_UInt n); const void *(*skip_backward) (const void *ptr, Parrot_UInt n); Parrot_UInt(*get) (const void *ptr); Parrot_UInt(*get_unicode) (const void *ptr); void *(*put) (void *ptr, Parrot_UInt c); void *(*put_unicode) (void *ptr, Parrot_UInt c); Parrot_Int(*is_digit)(Parrot_UInt c); Parrot_Int(*get_digit)(Parrot_UInt c); void (*find_substring)(const void *ptr, Parrot_UInt *start, Parrot_UInt *length); }; ---------------------------------------------------------------------------- -------------- -- Peter Gibbs EmKel Systems