Will Coleda wrote:

- Which language targeting parrot requires graphemes? You say, "A
grapheme is our concept.", but then say, "Parrot must support
languages which manipulate strings grapheme-by-grapheme" ... but if
it's our own concept, surely there aren't any languages that can be
forcing us to require it.

I clarified this in the new draft. What graphemes give us is a way to simulate the old character-by-character way of interacting with strings, while at the same time interacting safely with Unicode. Think of it as "Unicode support for languages too old to have Unicode builtin."

- Can we get some discussion of the scope of the grapheme table
entries? Is this for a single running instance of parrot? How can
multiple running copies of parrot share strings if they have different
grapheme table entries? How does this impact bytecode generation?
freeze/thaw? What happens when someone constructs a string that blows
the table size?

The scope of a grapheme table is a single string. Most grapheme tables will be quite small, as combining characters are statistically rare even in languages that use them.

This is an architectural change from the earlier drafts of the PDD. A global table doesn't interact well with freeze/thaw, with concurrency, with garbage collection, with continuation passing style, with software transactional memory, and regular expression backtracking. And, the semantics offered by a global grapheme table can all be supported by a local grapheme table for each NFG string.

- Instead of saying "This PDD assumes for the moment that the current
string functions will on the whole be maintained", I would much rather
see the the current API included in the document and reviewed as part
of the design. (Or point to another PDD that contains this API)

Yes, that's what I'm filling in now.

- In the same vein, I would also be curious to see a gap analysis (not
as part of this document); what is the scope of change to meet the
goals in the PDD?

Fairly conservative, at this point. A few additions to the core string structure. A new character set/encoding for NFG strings. A few functions will need to be modified, or split into a "grapheme" version and a "character" version.

Parrot's character set implementation already has "get_graphemes" and "set_graphemes" function pointers (for retrieving and setting 1 to n graphemes).

Allison

Reply via email to