If the string API is to be revised, I would like to suggest that
consideration be given to having a single string vtable, merging
the current encoding and chartype structures into a single one.
This removes one pointer from each string header, and allows
a single parameter to be used instead of two for transcode, etc.
Also, the IO system will need to have a mechanism for specifying
the character set used by files; this again could then be a single
value.
I do not believe that the two existing parameters are orthogonal,
so the number of charset (or whatever) entities would be less than
the cross product. e.g. the existing 2 chartypes x 4 encodings
would really only require 4 charsets.
I actually implemented this change some time ago as part of my
'African Grey' variant; an extract from my charset.h appears below.
The get_unicode and put_unicode entries combine the get and put
operations with transcoding; this simplifies the transcode operation
significantly. find_substring was an experimental feature that simply
replaced the two calls to skip_forward used by string_substr; it also
implemented the optimisation for single-byte encodings that has
subsequently been catered for by specific code in string_substr.
----------------------------------------------------------------------------
--------------
enum {
enum_charset_usascii,
enum_charset_utf8,
enum_charset_utf16,
enum_charset_utf32,
enum_charset_MAX
};
struct parrot_charset_t {
INTVAL index;
const char *name;
Parrot_UInt max_bytes;
Parrot_UInt(*length) (const void *ptr, Parrot_UInt bytes);
const void *(*skip_forward) (const void *ptr, Parrot_UInt n);
const void *(*skip_backward) (const void *ptr, Parrot_UInt n);
Parrot_UInt(*get) (const void *ptr);
Parrot_UInt(*get_unicode) (const void *ptr);
void *(*put) (void *ptr, Parrot_UInt c);
void *(*put_unicode) (void *ptr, Parrot_UInt c);
Parrot_Int(*is_digit)(Parrot_UInt c);
Parrot_Int(*get_digit)(Parrot_UInt c);
void (*find_substring)(const void *ptr, Parrot_UInt *start,
Parrot_UInt *length);
};
----------------------------------------------------------------------------
--------------
--
Peter Gibbs
EmKel Systems