Carl Banks wrote:
On Oct 18, 4:07 pm, Ethan Furman <et...@stoneleaf.us> wrote:

Dave Angel wrote:

Earlier, I would have agreed with you.  I assumed that this could be
done invisibly, with the only difference being performance.  But you
can't know whether join will do the trick without error till you know
that all the items are strings or Unicode strings.  And you can't check
that without going through the entire iterator.  At that point it's too
late to change your mind, as you can't back up an iterator.  So the user
who supplies a list with mixed strings and other stuff will get an
unexpected error, one that join generates.

To put it simply, I'd say that sum() should not dispatch to join()
unless it could be sure that no errors might result.

How is this different than passing a list to sum with other incompatible
types?

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> class Dummy(object):
...     pass
...
>>> test1 = [1, 2, 3.4, Dummy()]
>>> sum(test1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'float' and 'Dummy'
>>> test2 = ['a', 'string', 'and', 'a', Dummy()]
>>> ''.join(test2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 4: expected string, Dummy found

Looks like a TypeError either way, only the verbage changes.



This test doesn't mean very much since you didn't pass the the same
list to both calls.  The claim is that "".join() might do something
different than a non-special-cased sum() would have when called on the
same list, and indeed that is true.

Consider this thought experiment:


class Something(object):
    def __radd__(self,other):
        return other + "q"

x = ["a","b","c",Something()]


If x were passed to "".join(), it would throw an exception; but if
passed to a sum() without any special casing, it would successfully
return "abcq".

Thus there is divergence in the two behaviors, thus transparently
calling "".join() to perform the summation is a Bad Thing Indeed, a
much worse special-case behavior than throwing an exception.


Carl Banks

Unfortunately, I don't know enough about how join works to know that, but I'll take your word for it. Perhaps the better solution then is to not worry about optimization, and just call __add__ on the objects. Then it either works, or throws the appropriate error.

This is obviously slow on strings, but mention of that is already in the docs, and profiling will also turn up such bottlenecks. Get the code working first, then optimize, yes? We've all seen questions on this list with folk using the accumulator method for joining strings, and then wondering why it's so slow -- the answer given is the same as we would give for sum()ing a list of strings -- use join instead. Then we have Python following the same advice we give out -- don't break duck-typing, any ensuing errors are the responsibility of the caller.

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to