From: Thomas Lord <[email protected]> Subject: Re: [r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die Date: Sun, 20 Sep 2009 12:07:51 -0700
> > RnRS abandoning mutable strings does *not* prevent such > > tiny Scheme from having mutable strings as an implementation's > > extention. > > And vice versa. No. There's an asymmetry here. * Scheme with mutable-only strings can still use string libraries that are written for immutable strings. * Scheme with immutable-only strings cannot use string libraries that are written for mutable strings. > Both are "nice to have" and I would expect that > most implementations will want to support both. > It would be good to sanctify some specification > of both types and how they relate. The problem is that introducing mutable strings suddenly bloats the spec. You have to mark which operation returns immutable strings. Substring-like operations need two versions, one returning fresh string and other returns possibly shared string. Of course the same can be said to pairs and vectors, but the usage pattern is pretty different. I don't think we should overgeneralize here. > > [...] > > Requiring string ports (string builder) shouldn't be much > > burden to the tiny Scheme; > > String ports are an example of a generic > problem for which disjointed, piecemeal > solutions seem the wrong approach (puns intended). [snip] Yes, it's good to have small, clear core of generic approach. Let's have it. And what does it have something to do with mutable/immutable strings? > > I feel that your discussion explains why mutable > > string benefits tiny Scheme, but doesn't support why > > mutable strings should be in the standard. > > That is because we have to first agree on the > desired form and function of the standard. [...] > My thought for R7/small is for an even smaller > than traditional core, with "the rest" given both > narrative and code definitions. [snip] I basically agree your discussion here. From my side, we can provide the code definitions of, say, string ports, via mutable vector and vector->string; so far it seems orthogonal to mutable/immutable string discussion. > > It is plausible, but could you support your opinion with > > some concrete observation, experience, or algorithms? > > The counter observation of that 9 years of experience in > > Gauche community. > > One of the more fun projects I've done in Scheme > was an Emacs-like text editor. For that, I found > a very nice data-structure (good trade-offs) was > a kind of unholy mix of "gap buffers" (like in GNU > Emacs) with "ropes" (big strings represented as > (in this case) splay trees of smaller strings). > Modifying strings in the middle was important to > good performance for this. Not being able to modify > strings in the middle with expected-case decent > efficiency would have meant too much copying of data > or too high a fragmentation of long strings. If you represent the entire text in elaborated structure, why do you need the leaf to be Scheme strings? You cannot treat the entire text or subtrees of it as Scheme string anyway; you need special API to deal with them. Then you can just use mutable vectors in the leaf node as well. (Of course, if your Scheme has mutable strings then it's ok to use them. A portable library with optional implementation-specific optimizations can be configured either way) > I noticed in the list of string-set! uses that Aubrey > posted from SLIB, one of the uses came from a library > that provided a "format" function: something that takes > a format string and a bunch of other parameters and > creates a new string (like sprintf in C). That > strikes me as another case where string mutation is > very handy for avoiding excess data copying and > consing. Here I'd like to hear from Aubrey; to me, formatting is one part that string builder type pattern makes much more sense, since the length of the final string isn't generally known beforehand (and Gauche's format is implemented so). What kind of advantage did you see when you use string-set! in format? > In any application where I/O filtering (read > some input, tweak it, write output) needs to be > efficient, again, to avoid excessive data copying > string mutation is a big boon. Any *PORTABLE* I/O filtering using character/string domain have to accept the fact that arbitrary binary<->character conversion could be inserted during input and output. If you don't like that, you need to roll your own with binary I/O and bytevectors. If you're writing for a specific situation where external and internal encoding match (which is rather rare; even OS and user's environment settings affect the situation), then you can choose a specific implementation that supports mutable-string-extension. > Given the problematics of Unicode encoding, > I think the time is ripe to bite the bullet and > make the primitive string-replace! (which replaces > in situ an arbitrary substring with an arbitrary string). Right. I always feel that just protecting string-set! and string-fill! doesn't make sense. If mutable-string camp insists length-changing opertaion as well, then it make much more sense. Having arbitrary length-changnig operation basically abandons the view of string-as-a-fixed-length-character-array. The internal is implementation dependent, but importantly, string-set! may or may not be O(1). Thusly using mutable strings as string buffer is discouraged. String mutation may be thread-unsafe. If this view of string-as-elaborated- data-structure is shared, then I think I can live with mutable strings in the standard. --shiro _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
