On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote:
> Thanks, Jarrko.
>
> On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote:
> > The most important message is that give up on 8-bit bytes, already.
> > Time to move on, chop chop.
>
> Do you think/feel/wish/demand that the textual (string) APIs should differ
> from the binary (byte) APIs? (Both from an internal Parrot perspective and
> at the language level.)
I tried to address this issue at two points in the document, "Of Bits
and Bytes", and one paragraph in "TO DO" talking about encoding
conversions and I/O. But I guess the answer is "yes and yes", I think
the APIs should be different. It pains my UNIX heart but thinking in
terms of just bytes was a convenient illusion that worked as long we
kept ourselves to 8-bit byte character sets. I think the illusion
works no more.
> This may be beyond the scope of the document, but do you have an opinion on
> whether strings need to be entirely encapsulated within a single structure,
> or whether "virtual" strings (comprising several disparate substrings) are a
> viable addition?
>
> typedef struct {
> UINTVAL size;
> UINTVAL index;
> UINTVAL index_offset;
> UINTVAL last_offset;
> UINTVAL size_valid:1;
> UINTVAL offset_valid:1;
> UINTVAL last_valid:1;
> UINTVAL continued:1;
> PARROT_STRING string;
> PARROT_SIZED_STRING string_continued;
> } PARROT_SIZED_STRING
First off, I think virtual strings (if you define strings as "a linear
collection of characters (or bytes)" are a great idea, that's why I
suggested them a while ago even in the context of Perl 5 (though I
admit I also simply liked the proposed name: VVs...) But I also think
they are high-level enough that they probably should not be any of the
low-level string structures. For example: one nifty thing you can do
with virtual strings is that they can be read-only windows to another
string, and I don't think the read-onlyness flag belongs to the
low-level strings: it's something coming from above. Similarly
from virtual strings composed of slices of several other strings:
how do you manage the book-keeping of these other strings? Too complex:
let's keep the low-level, ummm, low-level.
> This was discussed earlier mostly for alleviating some of the headaches
> associated with variable-width encodings.
If we keep the low-level limited to just a handful of encodings
(I proposed three), and the variable encodings well-behaved (UTF-8 as
opposed to the gnarlier ones), I don't think the burden will be too bad.
> --
> Bryan C. Warnock
> [EMAIL PROTECTED]
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen