William D Clinger wrote:
I am posting this as an individual member of the Scheme
community.  I am not speaking for the R6RS editors.

Thomas Lord wrote:
Earlier revisions of the standard defined a portable character set,
allowing implementations to freely expand beyond that set.
In a portable program, if only the portable character set is
used, reliably portable behavior obtains.

What's different now is that Unicode has become an
established standard, and the portability advantages
of requiring Scheme programs to use Unicode (which
is more than just a character set) appear far larger
than any advantages that might still be derived from
allowing implementations and programs to choose their
own character sets.


Hopefully I'll finish the formal comment in time.  Briefly:

"Requiring [portable] Scheme programs to use Unicode [scalar values],"
in any reasonable sense of that phrase, is not at stake here.

"Forbidding implementations from supporting additional
characters," is one part of what is at stake.




In the R6 draft, the entire set of permitted characters is
explicitly enumerated.

Actually, I believe the set of permitted characters
is enumerated by reference to Unicode character
categories.  SFAIK, the set of characters in those
categories is still growing, albeit slowly.

I don't see the need for the word "actually," there -- I don't
think we're contradicting one another though I can understand
how a narrow interpretation of "explicitly enumerated" would
lead you there.


Moreover, the set's mapping to integer
values is both discontinuous and defined by three constants
that, a priori, appear to be arbitrary.

The constants are part of the Unicode standard, and
are more historical than arbitrary.  With hindsight
we all would have preferred a contiguous range, but
I understand the historical circumstances that led
to the hole in the middle.


How do you get from there to mandating that hole (and much
else that it implies) in all implementations?   Relaxing those
restrictions would not seem to change the behavior of non-divergent
programs unless, perhaps, those which use exceptions in some
particularly odd ways.


My question is whether any principled reason for these arbitrary
constants is given that might be supported without appeal
to analogies to other programming languages.

SFAIK, the justification for the constants has naught
to do with other programming languages, but with Scheme
and Unicode.  Of all Unicode concepts, the one that comes
closest to Scheme's historical notion of a character is
the Unicode notion of a scalar value.

Scheme could have defined its own encoding of scalar
values, and that range could have been contiguous, but
that would have been a Seriously Bad Idea.  Using some
Scheme-specific encoding would have created enormous
confusion and made interfacing with other systems more
difficult.

You're skirting around the issue of permitting v. forbidding extensions
that shouldn't have any impact on well-written portable programs.


Note that there is a fine distinction to be made between arbitrary
choices such as the numeric values assigned to portable characters,
and arbitrary choices such as a mandatory domain restriction
on INTEGER->CHAR.   In the former, if CHAR<->INTEGER
conversion is to be supported at all, it is clear that *some* arbitrary
choice must be made and so, of course, appeal to a popular standard
for that.   In the latter case, the domain restriction, there is no obvious
reason to believe any such restriction is needed or makes the language
better than another language without that restriction.

Even in the latter case, the report should state the domain
for which integer->char can be relied upon to behave portably.


Yet it does more than that.  If it only specified what can be relied upon
in a portable program this conversation would have a different form.

The conversation would probably still occur, though, because (as I'll try
to explain in the formal -- and here I thought I'd given up) -- because
once you start to unpack the loosening of that restriction, a whole bunch
of other changes follow.


Your question seems to come down to whether that procedure
should be required to raise an exception when given values
outside its portable domain:

So, how does it come to pass that those patently arbitrary aspects of
Unicode
appear in the report not as a set of domain limits within which
the behavior of portable programs is assured, but as restrictions that forbid
an implementation from expanding the domains and ranges of certain
standard procedures?

The argument, I believe, is that passing a non-portable value
to integer->char is likely to be a common error, especially
among programmers who are just now learning about Unicode or
were introduced to Unicode in programming languages that were
standardized back when Unicode was expected to use a 16-bit
character set, and that allowing such non-portable arguments
to integer->char would, if allowed by the report, also be a
common error among implementors who are just now learning
about Unicode or were introduced to Unicode in programming
languages that were standardized back when Unicode was expected
to use a 16-bit character set.

Excellent. That is what tuned-for-education implementations like the PLT family
are for.


Making it clear up front that desiring to pass non-portable
values to integer->char is a grievous conceptual error will
save everyone a lot of grief later.

That's not the proper function of the report, in my opinion.   I also
disagree with the use of the word "everyone".


There is a legal question at issue: how certain procedures should
be specified.   But the larger question is on what basis, by what ways,
should such specifications be decided?

If R6 is simply to be a record of votes taken, a kind of tallying up
of a political process with purely pragmatic aims, then perhaps
it is no longer a "report" at all.   The line of thought that started
with the "ultimate" papers has ended.   What carries on, in its place,
is a particular *use* of the main tangible artifact of that line of
thought.   And, in that case, the introduction should certainly be
purged or retitled "Obituary" and the document as a whole
retitled.

I have some sympathy for that point of view.  I have less
sympathy for that point of view with respect to Unicode than
with several other parts of the report, however, because I
think the draft report's treatment of Unicode is one of the
more compelling arguments to be made in its favor.


Lemme see if I can pull of the formal. If I had to bet, and I do finish it, it'll go down in flames with the editors but, hopefully it will at least be a
fun and provocative read, clarifying some of my position.


-t



Will

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss



_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to