There are a number of shortcomings in the API, which I'd like to address here, and propose improvments for.
Not so much the string_* functions, but rather with how they work (the encoding API, the transcoding functions). To allow user-defined encodings, and user-defined transcoding, (written in parrot) the first parameter of all of the function pointers in the ENCODING and TYPE structures should be INTERP. This, of course means that the first parameter to all of the string_* functions needs to be INTERP, too. Prefereably, all of them, not just the ones which actually use ->encoding or ->type... so that we'll have the freedom to change them so they *do* use ->encoding and ->type in the future. (And for consistancy). Note that currently, *not* all of the string_ functions have INTERP as their first param. For other encodings than the built-in ones, the string data should not *need* to be inside of the string buffer itself (though of course it still *can* be); we should be able to build it lazily, or read it from a disk, or by calling methods on a PMC, or whatever. This will likely mean having one or more pObjs (PMCs, generally) stored in the buffer. This, in turn, means we need a 'mark' entry in the encoding vtable, to prevent them from being cleaned up from under us. We should also have the option of allocating non-gced memory from the system and freeing it when the string is freed; this means we need a 'destroy' entry in the encoding vtable. (For the builtin string types, 'mark' and 'destroy' will of course be NULL. For custom strings, of course, they might not be.) (For simplicity, we might allow the buffer to *only* consist of either raw data, *or* of pointers to pObjs; then we only need a single flag to tell the gc which we have, and don't need a 'mark' or 'destroy') I *really* *really* want string iterators. The current API for iterating through the characters of a string is, IMHO, vastly insufficient. The following are what I want for string iterators: 1/ Iterators won't become invalid if the string gets moved in memory. Currently, all we've got is a void* pointer which points into the buffer of the string; during GC, strings can get reallocated, making the pointer invalid. For that matter, if you grow a string, while iterating forward through it, there's a good chance that your iterator will become invalid. 2/ Iterators should be integers, structs, or pointers into immobile memory (memory which won't be moved during GC). They should not need to be anchored to avoid being GCed, nor freed to avoid memory leakage. If, for the builtin types, we changed that 'void*' pointer to an integer, indicating a number of bytes from strstart, then conditions 1&2 would be satisfied for them. 3/ The encoding functions (iterator creation, advancement, etc.) should be able to call into parrot code; thus, they need an INTERP as the first parameter. 4/ It should take O(1) time to get an iterator to the start or end of a string. 5/ It should take O(n) time to advance an iterator n characters (either forwards or backwards). It would be nice if it took O(1) time, but it's not necessary. 6/ It should take O(1) time to decode whatever characters are at the iterator. 7/ If two iterators are N characters apart, it should take O(N) time to measure that distance. 8/ The encoding/iterator API should be sufficiently complete to allow someone to write a character-rope string type, and have it work seamlessly with other strings. 9/ New ops which provide access to the string iterator API. 10/ Add methods to PerlString to make it compatible with Iterator. 11/ Any string_ function which takes a character index as a parameter, should be able to take a string iterator. 12/ The rx engine should use the new ops. 12a/ We should be able to use the rx engine to "match" a stream of values from an Iterator PMC. Whether this Iterator is crawling over a PerlString, or PerlArray, or something else, shouldn't matter to the rx engine. -- $a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED] ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}