Leopold Toetsch wrote:
> 
> Benjamin Goldberg <[EMAIL PROTECTED]> wrote:
> > Leopold Toetsch wrote:
> >>
> >> Benjamin Goldberg <[EMAIL PROTECTED]> wrote:
> >> > There are a number of shortcomings in the API, which I'd like to
> >> > address here, and propose improvments for.
> >>
> >> > To allow user-defined encodings, and user-defined transcoding,
> >> > (written in parrot) the first parameter of all of the function
> >> > pointers in the ENCODING and TYPE structures should be INTERP.
> >>
> >> This belongs IMHO into PerlString (or better a class derived from
> >> that).
> 
> > Then how do we pass a user-defined string to a function which expects
> > an argument in an Sreg?
> 
> We have here IMHO the same relationship as with ints:
> int (INTVAL) <-> Int (PerlInt)
> str (STRING) <-> Str (PerlString)
> You can't tie native types, you can't attach properties on these and so
> on. And you can't pass some kind of active native strings in the SReg.
> The user-defined (written in parrot) implies, that these are
> PerlString PMCs.

Not having an INTERP argument severely limits us, even in other ways.

Even ignoring the problem of an encoding being an "active" string, what
about if it needs to allocate memory (perhaps some temporary buffer to
do some computation needed to decode bytes of a string)? 
sys_mem_allocate and sys_mem_free all over the place?  Blech.  I want to
be able to allocate a garbage collectible Buffer :(

This also eliminates the chance of having a STRING* which is attached to
a file -- we'd only be able to do such a thing as a PerlString
derivative.

Similarly, that would eliminate the chance of a STRING* which is
actually a lazily concatenated list of other STRING*s; we'd only be able
to do this as a PerlString derivitive.

> > Now, suppose that instead of a pointer, we had an integer describing
> > the number of bytes from strstart to where we're looking... *now* most
> > of the problems go away.
> 
> So lets change the encoding->skip_{for,back}ward to take/return an
> INTVAL being the byte-position relative to strstart.

And they need to take str->strstart! :)

I said "most", not "all".  It solves the problems incurred with the
string buffer getting moved by gc (which is good), but it doesn't solve
everything.

In particular, if we make a STRING* encoding which is a lazily concatted
list of other strings (yes I keep going back to it, but Larry said we'll
have them.  Theoretically he might have only meant as a high level type
(a PerlString derivitive) but *I* think it would be nice for to be able
to have this as a STRING* type) we'd want our iterator to be two
integers: the first one being the integer into our array, the second
being the iterator into the string we're currently iterating through.

> >> >    11/ Any string_ function which takes a character index as a
> >> > parameter, should be able to take a string iterator.
> >>
> >> Bloat IMHO. While this abstraction is flexible, it IMHO doesn't
> >> belong into the string subsystem but into a string class, that
> >> implements these functions.
> 
> > The bloat can be avoided if the primary string_ implementations *only*
> > took string iterators.  Then, to satisfy those who want to use
> > character indices, provide wrappers which take character index
> > arguments, and converts them into string iterators relative to those
> > particular strings.
> 
> Ok. I see. That's fine - except for utf8 strings.

Why wouldn't it work for utf8 strings?

   INTVAL string_index_to_iterator(INTERP, STRING *s, INTVAL index) {
      INTVAL start = s->encoding->iterator_start(interpreter, s);
      return s->encoding->skip_forward(interpreter, s, start, index);
   }

Converting an index to an iterator for a utf8 string crawls through the
string and finds the byte offset.

> But these could be converted to utf32 as soon as they are seen.

For a long string, that could be quite a bit of bloat.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED]
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Reply via email to