Re: [r6rs-discuss] perhaps i should be formal, but....

Thomas Lord Wed, 14 Mar 2007 14:11:57 -0800

Well, I wasn't gonna say it but I'm glad *someone* brought
up this use-case.


(And, no, this use-case isn't my main inspiration but it is on
the list things I think about in this area.)

-t


Shiro Kawai wrote:

From: William D Clinger <[EMAIL PROTECTED]>
Subject: Re: [r6rs-discuss] perhaps i should be formal, but....
Date: Wed, 14 Mar 2007 16:16:11 -0400

I am posting this as an individual member of the Scheme
community.  I am not speaking for the R6RS editors.

Thomas Lord wrote:

Earlier revisions of the standard defined a portable character set,
allowing implementations to freely expand beyond that set.
In a portable program, if only the portable character set is
used, reliably portable behavior obtains.

What's different now is that Unicode has become an
established standard, and the portability advantages
of requiring Scheme programs to use Unicode (which
is more than just a character set) appear far larger
than any advantages that might still be derived from
allowing implementations and programs to choose their
own character sets.


I also want to reserve a possibility of using different
character sets / encodings, but I agree that Unicode is
the only practical standard for portable programs.  I'm
happy as far as R6RS does not prohibit an implementation
to use alternative character set / encodings if it wish.

  For example, Japanese official family registration system
  uses its own character set and codepoints, since it needs
  to distinguish more subtle differences of characters than
  Unicode.  (There's no clear line between abstract characters
  and glyphs---it is context-dependent, and for family names
  the line gets closer to glyphs).

The range restriction of integer->char seems reasonable
to guarantee the portable behavior.  The implementation
can have another procedure that can deal with non-unicode
range/character set.

Although I feel it better that the standard uses clearer
namings between integer and character conversion, such as
unicode-scalar-value->char (or some abbreviation of it), which
makes it clear that one can't pass non-unicode scalar value.
This isn't a strong desire, though.

The wording of "character" object definition, however, could
be changed.  It is unclear to me that (char? <obj>) can return
#t if <obj> is in the implementation's extended character
set but not in unicode.   If it can't, I can still provide
(extended-char? <obj>) for example, but it's a bit awkward.
So as other procedures that deals with characters---in
(string <char> ...), should each <char> be in unicode? (I hope not!)

--shiro


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss



_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] perhaps i should be formal, but....

Reply via email to