Re: [r6rs-discuss] [ANN] scheme-reports.org

Brian Mastenbrook Mon, 24 Aug 2009 19:42:13 -0700

On Aug 24, 2009, at 8:37 PM, John Cowan wrote:

>> I'm also not convinced by the argument that a string of length one
>> removes the need for a separate tagged representation for the units  
>> of
>> which the string is composed. The most primitive facility provided by
>> any decoder or encoder is a mapping between code points and sequences
>> of bytes; when working at that level, I'd prefer to have a type  
>> with a
>> disjoint predicate representing the well-defined input type I am
>> receiving.
>
> I provide another intuition pump.  Back in the Very Old Days, when  
> symbols
> were the only kind of strings Lisp had, people did string work with  
> EXPLODE
> and IMPLODE, mapping symbols to and from a list of the characters in
> the symbol's print name.  Those characters were themselves symbols,
> not a distinct datatype.  That worked fine.


Obviously not so fine that we're still using this mechanism, but I  
don't think the reasons why not have much to do with the issue at  
hand. :-)

> The argument from encoding seems irrelevant to me.  One can do a  
> *better*
> job of encoding if handed whole strings: the string "a\x0301;" can be
> intelligently encoded into ISO 8859-1 as the bytevector #vu8(#xE1),
> whereas the individual character #\x301 can't be encoded in 8859-1  
> at all.

I mentioned this to you out-of-band, but for the benefit of the list,  
I'm reasonably sure that it's not a good idea to have an encoder/ 
decoder which is not idempotent. It's tempting to "try harder" to  
encode a string by using a pre-composed character in this case, but  
I'd suggest that it'd be better to ensure that all strings are in NFC  
internally if the application might need to use an encoding like  
latin-1.

> Likewise, since integer->char isn't total, there's no real reason why
> a version of char->integer that accepts single-codepoint strings  
> should
> be either.  As I pointed out in my last posting, this says nothing  
> about
> the underlying implementation, which may well represent single- 
> codepoint
> strings specially.


At this point I'm reasonably convinced that the data type for code  
points need not be disjoint from the string data type, though I think  
that a predicate to distinguish a string containing a single code  
point from other strings is still necessary.
--
Brian Mastenbrook
[email protected]
http://brian.mastenbrook.net/


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] [ANN] scheme-reports.org

Reply via email to