Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Shiro Kawai Sun, 20 Sep 2009 13:28:12 -0700

From: Thomas Lord <[email protected]>
Subject: Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: 
string-set! must die
Date: Sun, 20 Sep 2009 12:07:51 -0700


> > RnRS abandoning mutable strings does *not* prevent such
> > tiny Scheme from having mutable strings as an implementation's
> > extention. 
> 
> And vice versa.   

No.  There's an asymmetry here.   

* Scheme with mutable-only strings can still use
  string libraries that are written for immutable strings.
* Scheme with immutable-only strings cannot use
  string libraries that are written for mutable strings.

> Both are "nice to have" and I would expect that
> most implementations will want to support both.
> It would be good to sanctify some specification 
> of both types and how they relate.

The problem is that introducing mutable strings suddenly
bloats the spec.  You have to mark which operation returns
immutable strings.   Substring-like operations need two
versions, one returning fresh string and other returns
possibly shared string.

Of course the same can be said to pairs and vectors, but
the usage pattern is pretty different.  I don't think we
should overgeneralize here.

> > [...]
> > Requiring string ports (string builder) shouldn't be much
> > burden to the tiny Scheme; 
> 
> String ports are an example of a generic
> problem for which disjointed, piecemeal 
> solutions seem the wrong approach (puns intended).
[snip]

Yes, it's good to have small, clear core of generic
approach.  Let's have it.  And what does it have
something to do with mutable/immutable strings?

> > I feel that your discussion explains why mutable
> > string benefits tiny Scheme, but doesn't support why
> > mutable strings should be in the standard.
> 
> That is because we have to first agree on the 
> desired form and function of the standard.
[...]
> My thought for R7/small is for an even smaller
> than traditional core, with "the rest" given both
> narrative and code definitions.
[snip]

I basically agree your discussion here.  From my side,
we can provide the code definitions of, say, string
ports, via mutable vector and vector->string; so far
it seems orthogonal to mutable/immutable string discussion.

> > It is plausible, but could you support your opinion with
> > some concrete observation, experience, or algorithms?
> > The counter observation of that 9 years of experience in
> > Gauche community.
> 
> One of the more fun projects I've done in Scheme
> was an Emacs-like text editor.   For that, I found
> a very nice data-structure (good trade-offs) was
> a kind of unholy mix of "gap buffers" (like in GNU
> Emacs) with "ropes" (big strings represented as 
> (in this case) splay trees of smaller strings).  
> Modifying strings in the middle was important to 
> good performance for this.  Not being able to modify
> strings in the middle with expected-case decent
> efficiency would have meant too much copying of data
> or too high a fragmentation of long strings.

If you represent the entire text in elaborated
structure, why do you need the leaf to be Scheme
strings?   You cannot treat the entire text or
subtrees of it as Scheme string anyway; you need
special API to deal with them.  Then you can just
use mutable vectors in the leaf node as well.
(Of course, if your Scheme has mutable strings
then it's ok to use them.  A portable library with
optional implementation-specific optimizations
can be configured either way)

> I noticed in the list of string-set! uses that Aubrey
> posted from SLIB, one of the uses came from a library
> that provided a "format" function: something that takes
> a format string and a bunch of other parameters and
> creates a new string (like sprintf in C).  That 
> strikes me as another case where string mutation is
> very handy for avoiding excess data copying and 
> consing.

Here I'd like to hear from Aubrey; to me, formatting
is one part that string builder type pattern makes
much more sense, since the length of the final string
isn't generally known beforehand (and Gauche's format
is implemented so).   What kind of advantage
did you see when you use string-set! in format?

> In any application where I/O filtering (read 
> some input, tweak it, write output) needs to be
> efficient, again, to avoid excessive data copying
> string mutation is a big boon.

Any *PORTABLE* I/O filtering using character/string
domain have to accept the fact that arbitrary 
binary<->character conversion could be inserted during
input and output.  If you don't like that, you need
to roll your own with binary I/O and bytevectors.

If you're writing for a specific situation where
external and internal encoding match (which is rather
rare; even OS and user's environment settings affect
the situation), then you can choose a specific
implementation that supports mutable-string-extension.

> Given the problematics of Unicode encoding,
> I think the time is ripe to bite the bullet and
> make the primitive string-replace! (which replaces
> in situ an arbitrary substring with an arbitrary string).

Right.  I always feel that just protecting string-set! and
string-fill! doesn't make sense.  If mutable-string camp
insists length-changing opertaion as well, then it make
much more sense.

Having arbitrary length-changnig operation basically
abandons the view of string-as-a-fixed-length-character-array.
The internal is implementation dependent, but importantly,
string-set! may or may not be O(1).  Thusly using mutable
strings as string buffer is discouraged.  String mutation
may be thread-unsafe.  If this view of string-as-elaborated-
data-structure is shared, then I think I can live with mutable
strings in the standard.

--shiro


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die

Reply via email to