There are a number of shortcomings in the API, which I'd like to address
here, and propose improvments for.

Not so much the string_* functions, but rather with how they work (the
encoding API, the transcoding functions).

To allow user-defined encodings, and user-defined transcoding, (written
in parrot) the first parameter of all of the function pointers in the
ENCODING and TYPE structures should be INTERP.

This, of course means that the first parameter to all of the string_*
functions needs to be INTERP, too.  Prefereably, all of them, not just
the ones which actually use ->encoding or ->type... so that we'll have
the freedom to change them so they *do* use ->encoding and ->type in the
future.  (And for consistancy).  Note that currently, *not* all of the
string_ functions have INTERP as their first param.

For other encodings than the built-in ones, the string data should not
*need* to be inside of the string buffer itself (though of course it
still *can* be); we should be able to build it lazily, or read it from a
disk, or by calling methods on a PMC, or whatever.  This will likely
mean having one or more pObjs (PMCs, generally) stored in the buffer. 
This, in turn, means we need a 'mark' entry in the encoding vtable, to
prevent them from being cleaned up from under us.  We should also have
the option of allocating non-gced memory from the system and freeing it
when the string is freed; this means we need a 'destroy' entry in the
encoding vtable.

(For the builtin string types, 'mark' and 'destroy' will of course be
NULL.  For custom strings, of course, they might not be.)

(For simplicity, we might allow the buffer to *only* consist of either
raw data, *or* of pointers to pObjs; then we only need a single flag to
tell the gc which we have, and don't need a 'mark' or 'destroy')


I *really* *really* want string iterators.  The current API for
iterating through the characters of a string is, IMHO, vastly
insufficient.

The following are what I want for string iterators:

   1/ Iterators won't become invalid if the string gets moved in memory.

Currently, all we've got is a void* pointer which points into the buffer
of the string; during GC, strings can get reallocated, making the
pointer invalid.  For that matter, if you grow a string, while iterating
forward through it, there's a good chance that your iterator will become
invalid.

   2/ Iterators should be integers, structs, or pointers into immobile
memory (memory which won't be moved during GC).  They should not need to
be anchored to avoid being GCed, nor freed to avoid memory leakage.

If, for the builtin types, we changed that 'void*' pointer to an
integer, indicating a number of bytes from strstart, then conditions 1&2
would be satisfied for them.

   3/ The encoding functions (iterator creation, advancement, etc.)
should be able to call into parrot code; thus, they need an INTERP as
the first parameter.

   4/ It should take O(1) time to get an iterator to the start or end of
a string.

   5/ It should take O(n) time to advance an iterator n characters
(either forwards or backwards).  It would be nice if it took O(1) time,
but it's not necessary.

   6/ It should take O(1) time to decode whatever characters are at the
iterator.

   7/ If two iterators are N characters apart, it should take O(N) time
to measure that distance.

   8/ The encoding/iterator API should be sufficiently complete to allow
someone to write a character-rope string type, and have it work
seamlessly with other strings.

   9/ New ops which provide access to the string iterator API.

   10/ Add methods to PerlString to make it compatible with Iterator.

   11/ Any string_ function which takes a character index as a
parameter, should be able to take a string iterator.

   12/ The rx engine should use the new ops.

   12a/ We should be able to use the rx engine to "match" a stream of
values from an Iterator PMC.  Whether this Iterator is crawling over a
PerlString, or PerlArray, or something else, shouldn't matter to the rx
engine.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED]
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Reply via email to