Re: String API

Benjamin Goldberg Thu, 21 Aug 2003 16:23:49 -0700

Leopold Toetsch wrote:
> 
> Benjamin Goldberg wrote:
> 
> >
> > Leopold Toetsch wrote:
> > Not having an INTERP argument severely limits us, even in other ways.
> 
> The INTERP argument is fine. The user defined encoding is/was my
> problem.


As in, you think we shouldn't have any, at all?

> > Similarly, that would eliminate the chance of a STRING* which is
> > actually a lazily concatenated list of other STRING*s; we'd only be
> > able to do this as a PerlString derivitive.
> 
> I have problems imaginating such kind of STRINGs.

You lack sufficient imagination -- Larry's suggested that Perl6 strings
may consist of a list of chunks.  I can easily imagine each of those
"chunks" being full-fledged STRING* objects.

A foolish question: can you imagine strings which are lazily read from a
file?

If so, could you imagine such a string, sitting in front of a really
really big file, bigger than could fit into memory?

Not only can I imagine all of theat, I can imagine the pain that would
be caused, if such file-strings are only implemented at the PMC level,
not at the STRING level, and someone unknowingly converts such a string
from a PMC into a STRING, and forces the entire file to be loaded from
disk into memory.

> They need an attached PMC doing the work + an attached list containing
> the string chunks.

Not necessarily.  If we could have str->strstart as a pointer to a
vector of STRING*s, we wouldn't need any PMC to contain the chunks.  And
the str->encoding api is (already) sufficient for doing the work.  The
only lack is a custom mark, to keep the sub-strings alive.

> You need a PMC anyway. Why not have this in a PerlString derived class.

If we have it in a PerlString derived class, and do not make it part of
STRING*, then we cannot pass such strings to C functions defined to
accept strings in STRING* parameters, or to Parrot subroutines which are
defined to accept strings in S-registers, or which move the strings from
P-registers to S-registers.

We would lose the magic, similar to how moving from a PerlInt to an
INTVAL loses magic.

Well, except that when a PerlInt loses magic going to an INTVAL, the
resulting integer generally takes *less* memory than it did as a PMC,
whereas losing magic by changing from a PMC to a STRING could very
easily result in using *more* memory.  (And doing lots of work, which we
wouldn't need if our string kept it's magic).

> So you don't have an overhead on "average" strings.

How much speed overhead is there?

> > In particular, if we make a STRING* encoding which is a lazily
> > concatted list of other strings (yes I keep going back to it, but
> > Larry said we'll have them.  Theoretically he might have only meant as
> > a high level type (a PerlString derivitive) but *I* think it would be
> > nice for to be able to have this as a STRING* type) we'd want our
> > iterator to be two integers: the first one being the integer into our
> > array, the second being the iterator into the string we're currently
> > iterating through.
> 
> How many strings in JAPP[]1 might need that?

That depends.  Does concatenation in Perl6, by default, produce a lazy
concatenation, or an immediate actual concatenation?

Furthermore, consider if we allowed something like:

   my str $slurp = File.new($filename).slurp(); # =
File.slurp($filename)?

Sure, we could have this read in the whole file, but wouldn't it be
nicer if it would *lazily* fill in $slurp?

> Do you really want to slow down all string access, just for one very
> special corner case?

I don't believe that it *would* slow down all string access.

> >>>... provide wrappers which take character index
> >>>arguments, and converts them into string iterators relative to those
> >>>particular strings.
> >>>
> >>Ok. I see. That's fine - except for utf8 strings.
> >>
> >
> > Why wouldn't it work for utf8 strings?
> 
> The wrapper is O(n) for utf8 strings. So converting once might be
> cheaper during the first character-index access.

For the current string code, we already take O(n) to get a void* pointer
into an appropriate part of a utf8 string, for each character-index.

If we factor the current code into functions taking iterators, an
index-to-iterator converter, and wrappers taking indices, then it
shouldn't be significantly slower than the current code (except for the
overhead of entering/leaving a function, which might be eliminated by
the C compiler inlining the wrapper and conversion function, if they're
small enough... or by us providing macro versions of the wrappers and
converter).

And if use of the wrappers is discouraged in favor of the iterator
versions, except in cases where random access to the string truly is
needed, then some speed improvements can be gotten.

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED]
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Re: String API

Reply via email to