On 1/10/11 2:41 AM, bearophile wrote:
Lars T. Kyllingstad:

My suggestions for things to remove:

hexdigits, digits, octdigits, lowercase, letters, uppecase, whitespace
  - What are these arrays useful for?

capwords()
  - It tries to do too much.

zfill()
  - The ljustify(),rjustify(), and center() functions
    should instead take an optional padding character
    that defaults to a space.

maketrans(), translate()
  - I don't even understand what these do.

inPattern(), countchars(), removechars()
  - Pattern matching is std.regex's charter.

squeeze(), succ(), tr(), soundex(), column()
  - I am having a very hard time imagining myself ever
    using these functions...

I agree with about nothing you have said :-)

How much string processing you do day by day? I am using most of
those things... If you are used in using Python or Ruby you probably
find most of those things useful. If Andrei removes arrays like
lowercase, letters, uppecase, I will have to write them myself in
code.

The arrays letters, uppercase, and lowercase aren't all that useful because they only make sense for ASCII. Besides, they should be encoded as functions.

ljustify(),rjustify(), and center() are very useful, even if
they may be improved in some ways.

Hmmm. I suspected everyone's list will be different :o). I personally think the justification and centering functions are rarely useful - how often does one need to justify plain text? If you generate HTML the markup will do that for you and if you generate some nice text then the font will be proportional so the functions are useless.

Nevertheless, I ported them (and also fixed them - they were broken for anything non-ASCII, which probably is telling of the extent of their usage).

What are your use cases for these three functions?

maketrans() and translate() (as
other things) come from Python string functions, and I have used them
a hundred times in string processing code. I have used squeeze() some
times. soundex is not hurting, because even if it's not commonly
necessary, its name is easy to understand and it's not easy to miss
for something different, so it doesn't add much noise to the library.
And I've seen that it's easy to implement soundex wrongly, while the
one in the std.string is correct.

I think maketrans/translate are okay (if a bit arcane) but they need to be ported to Unicode.

Python apparently does mind Unicode as of 3.x, although I'm not sure exactly what the semantics are: http://stackoverflow.com/questions/3031045/how-come-string-maketrans-does-not-work-in-python-3-1. One odd thing is that you'd expect a dynamic language like Python to dynamically detect ASCII vs. non-ASCII. The example shows that Python rejects string-based translation tables even when they are, in fact, ASCII.

I agree that too much stuff is generally bad in a library, because
searching for something requires more time if there are more items to
search into. In Bugzilla I have three or four bug reports that ask
for few small changes in std.string (like removing chop and keeping
chomp). But please don't remove too much. In a library more is often
better.

I think we should remove all functions that rely on patterns represented as strings: inPattern, countchars, removechars, squeeze, munch.

Representing patterns as a convention on top of otherwise untyped strings doesn't seem a good solution for D. We should either go with regex or with a simple pattern structure and a helper function. That way people can say e.g. munch(s, pattern("[0-9]")).


Andrei

Reply via email to