On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote: > Thanks, Jarrko. > > On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote: > > The most important message is that give up on 8-bit bytes, already. > > Time to move on, chop chop. > > Do you think/feel/wish/demand that the textual (string) APIs should differ > from the binary (byte) APIs? (Both from an internal Parrot perspective and > at the language level.)
I tried to address this issue at two points in the document, "Of Bits and Bytes", and one paragraph in "TO DO" talking about encoding conversions and I/O. But I guess the answer is "yes and yes", I think the APIs should be different. It pains my UNIX heart but thinking in terms of just bytes was a convenient illusion that worked as long we kept ourselves to 8-bit byte character sets. I think the illusion works no more. > This may be beyond the scope of the document, but do you have an opinion on > whether strings need to be entirely encapsulated within a single structure, > or whether "virtual" strings (comprising several disparate substrings) are a > viable addition? > > typedef struct { > UINTVAL size; > UINTVAL index; > UINTVAL index_offset; > UINTVAL last_offset; > UINTVAL size_valid:1; > UINTVAL offset_valid:1; > UINTVAL last_valid:1; > UINTVAL continued:1; > PARROT_STRING string; > PARROT_SIZED_STRING string_continued; > } PARROT_SIZED_STRING First off, I think virtual strings (if you define strings as "a linear collection of characters (or bytes)" are a great idea, that's why I suggested them a while ago even in the context of Perl 5 (though I admit I also simply liked the proposed name: VVs...) But I also think they are high-level enough that they probably should not be any of the low-level string structures. For example: one nifty thing you can do with virtual strings is that they can be read-only windows to another string, and I don't think the read-onlyness flag belongs to the low-level strings: it's something coming from above. Similarly from virtual strings composed of slices of several other strings: how do you manage the book-keeping of these other strings? Too complex: let's keep the low-level, ummm, low-level. > This was discussed earlier mostly for alleviating some of the headaches > associated with variable-width encodings. If we keep the low-level limited to just a handful of encodings (I proposed three), and the variable encodings well-behaved (UTF-8 as opposed to the gnarlier ones), I don't think the burden will be too bad. > -- > Bryan C. Warnock > [EMAIL PROTECTED] -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen