Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Aubrey Jaffer Tue, 22 Sep 2009 17:59:23 -0700

 | Date: Sun, 20 Sep 2009 10:29:51 -1000 (HST)
 | From: Shiro Kawai <[email protected]>
 | 
 | From: Thomas Lord <[email protected]>
 | Date: Sun, 20 Sep 2009 12:07:51 -0700
 | 
 | ...
 | > I noticed in the list of string-set! uses that Aubrey
 | > posted from SLIB, one of the uses came from a library
 | > that provided a "format" function: something that takes
 | > a format string and a bunch of other parameters and
 | > creates a new string (like sprintf in C).  That 
 | > strikes me as another case where string mutation is
 | > very handy for avoiding excess data copying and 
 | > consing.
 | 
 | Here I'd like to hear from Aubrey; to me, formatting
 | is one part that string builder type pattern makes
 | much more sense, since the length of the final string
 | isn't generally known beforehand (and Gauche's format
 | is implemented so).   What kind of advantage
 | did you see when you use string-set! in format?


Dirk Lutzebaeck and Ken Dickey were the original authors.  Most of the
uses of STRING-SET! are filling the mantissa and exponent strings
created by MAKE-STRING.  All look to be putting the character at the
end; so I expect a string-port would work as well.

 | > In any application where I/O filtering (read 
 | > some input, tweak it, write output) needs to be
 | > efficient, again, to avoid excessive data copying
 | > string mutation is a big boon.
 | 
 | ...
 | > Given the problematics of Unicode encoding,
 | > I think the time is ripe to bite the bullet and
 | > make the primitive string-replace! (which replaces
 | > in situ an arbitrary substring with an arbitrary string).
 | 
 | Right.  I always feel that just protecting string-set! and
 | string-fill! doesn't make sense.  If mutable-string camp
 | insists length-changing opertaion as well, then it make
 | much more sense.

Unicode doesn't play well with a character datatype.  Downcasing or
foldcasing a single scalar-value can result in a length 2 string.
If anyone cares, other Unicode-supporting language development efforts
seem to be moving away from the character datatype:

 Accoring to <http://javascript.crockford.com/survey.html>, JavaScript
 lacks chars:

   String is a sequence of zero or more Unicode characters. There is no
   separate character type.  A character is represented as a string of
   length 1.

 Ruby 1.8 used integers for chars (like C).  Ruby 1.9 returns length 1
 strings from indexing strings.

 According to
 <http://www.win.tue.nl/~wstomv/edu/python/python-observations.html#Characters>
 Python lacks chars:

   Characters

   Python has no character type (in contrast to Pascal and C/C++).
   Although a string is a sequence type, the elements of a string are
   not "true" objectes by themselves.

   Strings of length one are used as characters, e.g. in the built-in
   functions chr() and ord().

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Reply via email to