Re: eliminate junk from std.string?

Andrei Alexandrescu Mon, 10 Jan 2011 17:30:29 -0800

On 1/10/11 2:41 AM, bearophile wrote:

Lars T. Kyllingstad:

My suggestions for things to remove:

hexdigits, digits, octdigits, lowercase, letters, uppecase, whitespace
  - What are these arrays useful for?

capwords()
  - It tries to do too much.

zfill()
  - The ljustify(),rjustify(), and center() functions
    should instead take an optional padding character
    that defaults to a space.

maketrans(), translate()
  - I don't even understand what these do.

inPattern(), countchars(), removechars()
  - Pattern matching is std.regex's charter.

squeeze(), succ(), tr(), soundex(), column()
  - I am having a very hard time imagining myself ever
    using these functions...


I agree with about nothing you have said :-)

How much string processing you do day by day? I am using most of
those things... If you are used in using Python or Ruby you probably
find most of those things useful. If Andrei removes arrays like
lowercase, letters, uppecase, I will have to write them myself in
code.

The arrays letters, uppercase, and lowercase aren't all that usefulbecause they only make sense for ASCII. Besides, they should be encodedas functions.

ljustify(),rjustify(), and center() are very useful, even if
they may be improved in some ways.

Hmmm. I suspected everyone's list will be different :o). I personallythink the justification and centering functions are rarely useful - howoften does one need to justify plain text? If you generate HTML themarkup will do that for you and if you generate some nice text then thefont will be proportional so the functions are useless.

Nevertheless, I ported them (and also fixed them - they were broken foranything non-ASCII, which probably is telling of the extent of their usage).


What are your use cases for these three functions?

maketrans() and translate() (as
other things) come from Python string functions, and I have used them
a hundred times in string processing code. I have used squeeze() some
times. soundex is not hurting, because even if it's not commonly
necessary, its name is easy to understand and it's not easy to miss
for something different, so it doesn't add much noise to the library.
And I've seen that it's easy to implement soundex wrongly, while the
one in the std.string is correct.

I think maketrans/translate are okay (if a bit arcane) but they need tobe ported to Unicode.

Python apparently does mind Unicode as of 3.x, although I'm not sureexactly what the semantics are:http://stackoverflow.com/questions/3031045/how-come-string-maketrans-does-not-work-in-python-3-1.One odd thing is that you'd expect a dynamic language like Python todynamically detect ASCII vs. non-ASCII. The example shows that Pythonrejects string-based translation tables even when they are, in fact, ASCII.

I agree that too much stuff is generally bad in a library, because
searching for something requires more time if there are more items to
search into. In Bugzilla I have three or four bug reports that ask
for few small changes in std.string (like removing chop and keeping
chomp). But please don't remove too much. In a library more is often
better.

I think we should remove all functions that rely on patterns representedas strings: inPattern, countchars, removechars, squeeze, munch.

Representing patterns as a convention on top of otherwise untypedstrings doesn't seem a good solution for D. We should either go withregex or with a simple pattern structure and a helper function. That waypeople can say e.g. munch(s, pattern("[0-9]")).



Andrei

Re: eliminate junk from std.string?

Reply via email to