Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Thomas Lord Sat, 19 Sep 2009 16:30:35 -0700

As a kind of counter-proposal, a perhaps sane "core Scheme"
(a subset of "small Scheme") could offer:


1. *Some* low-level solution for syntactic abstraction
   (I won't dwell on my opinion of what exactly,
   since that would be controversial.)

2. *Some* low-level solution for reader extensions.

3. *Some* low-level but fairly abstract system of 
   ports and an environment in which certain resources
   (like libraries) can be "opened" to create a port.

4. Low-level vectors - both immutable and mutable.

5. Lambda.

6. A constructor for disjoint types that can wrap
   an arbitrary value.

7. Fixnums.  (I/O on low-level ports consumes and yields these.)

8. *Perhaps* mutable lambdas (objects which can be 
   applied but you can modify them to change which
   simple lambda is invoked when applying them).


That'd be about it.

Everything in Small Scheme can be *explained* fairly 
well in terms of those things.   For example, cons-pairs
are length-two vectors wrapped up as a disjoint type.
Flonums can be explained as a fixnum or pair of fixnums
wrapped up as a disjoint type.

Real "core Scheme" code can actually implement those
familiar types in that way.   In a truly minimalist implementation
that would actually be potentially useful.   As a semantic
model, it would be useful.

Of course, most implementations would natively implement
many more types and features than what I described.
But the specification for those additional types and
features could be expressed quite precisely as core
Scheme code.

There would be less pressure, in this kind of approach,
to haggle over questions like "Strings: mutable or not?"
We can define both.  We can treat the traditional string
operators as generics that can work on either.  We can
quibble over exactly which ones are *required* in Small
Scheme but also enjoy that Small Scheme supports either
one.

-t



On Sat, 2009-09-19 at 01:24 -0400, John Cowan wrote:
> This is a proposal for the removal of string-set! (and consequently
> string-fill!) from the R7RS small Scheme language.  I am publishing this
> document to invite wide comment.  There is nothing official about it.
> I very gratefully acknowledge the kind help of Alex Shinn, who provided
> the topic sentences for most of the paragraphs below.  However, I retain
> sole responsibility for this document, including all errors.
> 
> I believe that despite the prescription of the draft WG1 charter that
> no features of IEEE Scheme (a subset of R4RS) should be removed from
> R7RS small Scheme, an exception should be made for string-set!, for at
> least the following reasons:
> 
> 1) Immutable strings are more purely functional, and allow many
> optimizations, such as being transparently and freely shareable between
> procedures and between threads without concern for uncontrolled mutation.
> For this and other reasons, the general trend in new languages/runtimes
> such as Java and C# is toward immutable strings; unfortunately, this is
> the kind of argument that Schemers usually don't like, so I won't bother
> mentioning it.  :-)
> 
> 2) Algorithms where you want to modify strings in the middle are rare,
> and many of the classic devices (such as string-upcase!, a procedure that
> mutates a string in place) are awkward or impossible with representations
> that make use of characters of variable length such as UTF-8.  Typical
> string algorithms want to also be able to do insertions and deletions,
> which are not directly possible with classical Scheme strings.  Better
> representations such as trees of immutable strings do allow such changes,
> as well as making string appends O(n) in the number of strings rather
> than in the sum of their lengths.
> 
> 3) If strings are immutable, it's possible to have both fast O(1)
> access to individual characters or substrings, and fairly space-efficient
> representation of full Unicode strings, by using different representations
> for strings drawn from diferent character repertoires.  For example,
> an implementation might use 8-bit code units when all characters are
> less than \#x100, 16-bit code units when all characters are less than
> \#x10000, and 32-bit code units otherwise.
> 
> Unfortunately, mutating even a single character in such a representation
> may require the entire string to be copied, which means that it also
> requires indirection through a separate header that can be redirected
> to point to the newly allocated code unit sequence.  Immutable strings
> can just *be* their sequences, with a few extra bits indicating the
> size of the code units, although this design does prevent easy sharing
> of substrings.
> 
> 4) As currently designed, strings are functionally just vectors of
> characters.  In an 8-bit world, using the traditional representation
> of strings carries a 4:1 storage advantage, making it worthwhile
> to distinguish them clearly from general vectors  But 21-bit Unicode
> characters are a much better fit, if represented as immediate (unboxed)
> values, for general vectors using 32-bit pointers.  Granted that not all
> small Scheme systems will provide full Unicode support, general vectors
> start to look much less expensive than they once were.  In short: if
> you want something that behaves like a vector of characters, simply use
> a general vector that contains characters.
> 
> 5) Making strings immutable also permits a design in which all strings
> are Unicode-normalized.  Though this has its own costs (for example,
> appending two strings may create a new string whose length is different
> from the lengths of the two source strings), it would be effectively
> impossible where arbitrary mutation is allowed.
> 
> 
> As a consequence of removing string-set!, string-fill! (not in IEEE
> Scheme) becomes impossible and string-copy less useful.  I do not propose
> to remove string-copy, however, because it can eliminate space leaks
> that are caused by taking a small shared substring of a large existing
> string: when the larger string should be GC'ed, it is retained as a
> whole because of the shared substring.  Using string-copy judiciously
> can prevent such leaks.
> 
> -- 
> John Cowan  [email protected]  http://ccil.org/~cowan
> If he has seen farther than others,
>         it is because he is standing on a stack of dwarves.
>                 --Mike Champion, describing Tim Berners-Lee (adapted)
> 
> _______________________________________________
> r6rs-discuss mailing list
> [email protected]
> http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Reply via email to