The .trans method and Least Surprise

Carl Mäsak Fri, 13 Jul 2012 12:10:05 -0700

Something's bothering me about the .trans method. This email lists a
proposal to split its semantics into two methods.


I'm not yet convinced myself about this proposal. It's quite late in
the game to make spec changes of established methods, and the change
will break some downstream application code, some of which I wrote. So
if the advantages don't come shining through, I will withdraw my
proposal. But I thought I would make a case for it and see what people
think of it.

Here's the .trans spec:

 <http://perlcabal.org/syn/S05.html#Transliteration>

Summary of the spec: .trans does what tr/// does (in Perl 5, and Perl
6). .trans also does a bunch of cool stuff with strings and regexes
and Longest-Token Matching that tr/// never did.

My proposal is to split away the cool stuff with strings and regexes
and LTM into its own method.

For purposes of concreteness, I will call this proposed method
.translate -- name proposals are allowed in the thread and on the
#perl6 channel, but in order to reduce bikeshedding, proposals without
a rationale will be ignored. Also note that the discussion is not
primarily about the name, but about the split itself.

Here are my reasons for the split:

* The spec literally goes from saying that the tr/// operator has a
method form, to overloading this method form with features that are
not in tr///. The method is showing signs of lack of cohesion.

* I use the more advanced features frequently, and they're great for
parallel, one-pass substitution of substrings. They're an improvement
over Perl 5's corresponding features. I don't want them to go away,
just to separate them into their own method.

* Linguistically, trans*literation* (which is what C<tr> stands for)
is about replacing individual characters. Substituting bigger chunks
is a kind of translation.

* Over the years, I've seen people struggling with the API of .trans,
which is for all intents and purposes two separate APIs: one involving
pairs of strings, and one involving pairs of Positional (spec says
Array). Something about the whole thing violates Least Surprise. This
whole email was prompted by my having to check the spec for the
umpteenth time and realizing that the API simply doesn't vibe with me.
Splitting up the two separate APIs into two methods would help.

* The more advanced parts of the current API -- the ones that allow
the right hand sides of pairs to be regexes or closures -- could be
migrated out along with the pairs-of-Positional API. Then the .trans
method would be left to handle *only* the things that tr/// handles,
and the .translate method could do the cool stuff.

* .trans could then have specially optimized code which does one-char
substitution efficiently. Though we're not there yet, because of the
cool stuff that I propose to move out into .translate, the plans for
an efficient implementation of .trans involve somehow generating a
grammar on-the-fly and then running it on the string to be
substituted. This seems like extreme overkill for tr///.

Have these things been bothering any of you too? Does the split make
sense? Would it help us to simplify the API and make people Less
Surprised by it? Are the projected wins of the spec change worth the
upsetting of the ecosystem?

Hopefully,
// Carl

The .trans method and Least Surprise

Reply via email to