If the string API is to be revised, I would like to suggest that
consideration be given to having a single string vtable, merging
the current encoding and chartype structures into a single one.

This removes one pointer from each string header, and allows
a single parameter to be used instead of two for transcode, etc.
Also, the IO system will need to have a mechanism for specifying
the character set used by files; this again could then be a single
value.

I do not believe that the two existing parameters are orthogonal,
so the number of charset (or whatever) entities would be less than
the cross product. e.g. the existing 2 chartypes x 4 encodings
would really only require 4 charsets.

I actually implemented this change some time ago as part of my
'African Grey' variant; an extract from my charset.h appears below.
The get_unicode and put_unicode entries combine the get and put
operations with transcoding; this simplifies the transcode operation
significantly. find_substring was an experimental feature that simply
replaced the two calls to skip_forward used by string_substr; it also
implemented the optimisation for single-byte encodings that has
subsequently been catered for by specific code in string_substr.

----------------------------------------------------------------------------
--------------
enum {
 enum_charset_usascii,
 enum_charset_utf8,
 enum_charset_utf16,
 enum_charset_utf32,
 enum_charset_MAX
};

struct parrot_charset_t {
 INTVAL index;
 const char *name;
 Parrot_UInt max_bytes;
 Parrot_UInt(*length) (const void *ptr, Parrot_UInt bytes);
 const void *(*skip_forward) (const void *ptr, Parrot_UInt n);
 const void *(*skip_backward) (const void *ptr, Parrot_UInt n);
 Parrot_UInt(*get) (const void *ptr);
 Parrot_UInt(*get_unicode) (const void *ptr);
 void *(*put) (void *ptr, Parrot_UInt c);
    void *(*put_unicode) (void *ptr, Parrot_UInt c);
    Parrot_Int(*is_digit)(Parrot_UInt c);
    Parrot_Int(*get_digit)(Parrot_UInt c);
    void (*find_substring)(const void *ptr, Parrot_UInt *start,
                           Parrot_UInt *length);
};
----------------------------------------------------------------------------
--------------

--
Peter Gibbs
EmKel Systems

Reply via email to