Re: on parrot strings

Jarkko Hietaniemi Fri, 18 Jan 2002 07:03:29 -0800

On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote:
> Thanks, Jarrko.
> 
> On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote:
> > The most important message is that give up on 8-bit bytes, already.
> > Time to move on, chop chop.
> 
> Do you think/feel/wish/demand that the textual (string) APIs should differ 
> from the binary (byte) APIs?  (Both from an internal Parrot perspective and 
> at the language level.)


I tried to address this issue at two points in the document, "Of Bits
and Bytes", and one paragraph in "TO DO" talking about encoding
conversions and I/O.  But I guess the answer is "yes and yes", I think
the APIs should be different.  It pains my UNIX heart but thinking in
terms of just bytes was a convenient illusion that worked as long we
kept ourselves to 8-bit byte character sets.  I think the illusion
works no more.

> This may be beyond the scope of the document, but do you have an opinion on 
> whether strings need to be entirely encapsulated within a single structure, 
> or whether "virtual" strings (comprising several disparate substrings) are a 
> viable addition?  
> 
>       typedef struct {
>            UINTVAL                    size;
>            UINTVAL                    index;
>            UINTVAL                    index_offset;
>            UINTVAL                    last_offset;
>            UINTVAL                    size_valid:1;
>            UINTVAL                    offset_valid:1;
>            UINTVAL                    last_valid:1;
>            UINTVAL                    continued:1;
>            PARROT_STRING              string;
>            PARROT_SIZED_STRING        string_continued;
>       } PARROT_SIZED_STRING

First off, I think virtual strings (if you define strings as "a linear
collection of characters (or bytes)" are a great idea, that's why I
suggested them a while ago even in the context of Perl 5 (though I
admit I also simply liked the proposed name: VVs...)  But I also think
they are high-level enough that they probably should not be any of the
low-level string structures.  For example: one nifty thing you can do
with virtual strings is that they can be read-only windows to another
string, and I don't think the read-onlyness flag belongs to the
low-level strings: it's something coming from above.  Similarly
from virtual strings composed of slices of several other strings:
how do you manage the book-keeping of these other strings?  Too complex:
let's keep the low-level, ummm, low-level.

> This was discussed earlier mostly for alleviating some of the headaches 
> associated with variable-width encodings. 

If we keep the low-level limited to just a handful of encodings
(I proposed three), and the variable encodings well-behaved (UTF-8 as
opposed to the gnarlier ones), I don't think the burden will be too bad.

> -- 
> Bryan C. Warnock
> [EMAIL PROTECTED]

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Re: on parrot strings

Reply via email to