On Sat, Apr 10, 2004 at 01:19:39PM +0300, Jarkko Hietaniemi wrote:
: I'm no Larry, either :-) but I think Larry is *not* saying that the
: "localeness" or "languageness" should hang off each string (or *shudder*
: off each substring).  What I've seen is that Larry wants the "level" to
: be a lexical pragma (in Perl terms).  The "abstract string" stays the
: same, but the operative level decides for _some_ ops what a "character
: stands for.

Yes, just as an abstract position stays the same, but it may have
different numeric interpretations in different lexical scopes.

: The default level should be somewhere between levels 1 and 2 (again, it
: depends on the ops).

Well, I don't think you can have a default between levels.  And if you
can, you shouldn't... :-)

I'd really like Perl 6 to default to grapheme level because that's what
the naive user will expect.  It'll be easy enough for the experts to
drop down to "use codepoints" or whatever the declaration turns out to be.

: For example, usually /./ means "match one Unicode code point" (a CCS
: character code).  But one can somehow ratchet the level up to 2 and make
: it mean "match one Unicode base character, followed by zero or more
: modifier characters".  For level 3 the language (locale) needs to be
: specified.

I really, really hate to call those "locales" because of all the
butchery that has happened in the name of locales.  If we give any
support to "locales" at all, it'll be at the low end at level 0.
If "language" isn't a good enough name for the distinctions at level 3,
then let's find a better name.  But it isn't "locale".

: As another example, bitstring xor does not make much sense for anything
: else than level zero.
: 
: The basic idea being that we cannot and should not dictate at what level
: of abstraction the user wants to operate.  We will give a default level,
: and ways to "zoom in" and "zoom out".

Yes, different views of a consistent semantics.  It's something Parrot
has to solve anyway to support multiple languages.  You might argue
that the four different levels of Unicode support in Perl 6 are really
four different languages, all called Perl.  Of course, we've said all
along that any time you say "use" you're mutating the language, so this
is nothing new...

In practical terms, one tricky thing to figure out is at what point
the number 3 gets turned into "3 bytes", "3 codepoints", "3 graphemes",
or "3 letters".

: (If Larry is really saying that the "locale" should be an attribute of
: the string value, I'm on the barricades with you, holding cobblestones
: and Molotov cocktails...)

Me too, me too!  Oh, wait...  :-)  

Larry

Reply via email to