Re: [Python-Dev] sum()

2005-03-11 Thread Alex Martelli
On Mar 11, 2005, at 19:39, Raymond Hettinger wrote:
[Alex]
If you're considering revisions to sum's design, my suggestion would
be
to find a way to let the user tell sum to use a more accurate approach
when summing floats.
FWIW, when accuracy is an issue, I use:
   sum(sorted(data, key=abs))
...and you may still lose a LOT of accuracy that way:-(.
Raymond, you technically reviewed the 2nd ed Cookbook -- don't you 
recall the sidebar about why partials are the RIGHT way to do 
summations?  I was SO proud of that one, thought I'd made the issue 
clearest than anywhere previously in print...

To summarize: as we all know, a+b loses significant digits from (say) b 
when abs(a) is much larger than abs(b).  Now, say we're summing a 
million numbers, all positive and roughly the same magnitude.  If we do 
it by total += item (which is basically what sum does), by the time 
we've summed the first 100,000 or so total IS *much larger than item* 
in absolute value -- we're systematically tossing away about 5 digits 
from each new item by that time!!!

To avoid such a massacre of digits, one uses partials -- summing items 
two at a time to get half as many partials, then iterating that idea 
about log2(N) times when summing N items.  Problem is, one needs O(N) 
auxiliary space (assuming one can't just overwrite the incoming 
list/array/whatever).

There's all other kinds of accuracy issues with a+b, but the one 
partials-based summation deals with is numero uno in many real-life 
situations, e.g. when one needs to get (say) the total area from N 
regions, each of roughly the same magnitude, whose areas were 
separately measured -- total length from N segments ditto -- total mass 
of N items ditto -- and so forth.

Alex
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-11 Thread Raymond Hettinger
[Alex]
> > FWIW, when accuracy is an issue, I use:
> >
> >sum(sorted(data, key=abs))
> 
> ...and you may still lose a LOT of accuracy that way:-(.
> 
> Raymond, you technically reviewed the 2nd ed Cookbook -- don't you
> recall the sidebar about why partials are the RIGHT way to do
> summations?  I was SO proud of that one, thought I'd made the issue
> clearest than anywhere previously in print...

No doubt about it.  Partials are superior.  My little snippet meets my
needs with no fuss -- my usual situation is many small values whose
accuracy gets clobbered upon encountering a handful of large values.

In the draft statistics module, nondist/sandbox/statistics/statistics.py
,
I used yet another approach and tracked a separate error term.  The
advantage is adding precision while still keeping O(n) summation speed.

Of course, we can always use the decimal module to add as much precision
as needed.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-11 Thread Tim Peters
FYI, there are a lot of ways to do accurate fp summation, but in
general people worry about it too much (except for those who don't
worry about it at all -- they're _really_ in trouble <0.5 wink>).

One clever way is to build on that whenever |x| and |y| are within a
factor of 2 of each other, x+y is exact in 754 arithmetic.  So you
never lose information (not even one bit) when adding two floats with
the same binary exponent.  That leads directly to this kind of code:

from math import frexp

class Summer:
def __init__(self):
self.exp2sum = {}

def add(self, x):
while 1:
exp = frexp(x)[1]
if exp in self.exp2sum:
x += self.exp2sum.pop(exp) # exact!
else:
break
self.exp2sum[exp] = x # trivially exact

def total(self):
items = self.exp2sum.items()
items.sort()
return sum((x for dummy, x in items), 0.0)

exp2sum maps a binary exponent to a float having that exponent.  If
you pass a sequence of fp numbers to .add(), then ignoring
underflow/overflow endcases, the key invariant is that the exact
(mathematical) sum of all the numbers passed to add() so far is equal
to the exact (mathematical) sum of exp2sum.values().  While it's not
obvious at first, the total number of additions performed inside add()
is typically a bit _less_ than the total number of times add() is
called.

More importantly, len(exp2sum) can never be more than about 2K.  The
worst case for that is having one input with each possible binary
exponent, like 2.**-1000 + 2.**-999 + ... + 2.**999 + 2.**1000.  No
inputs are like that in real life, and exp2sum typically has no more
than a few dozen entries.

total() then adds those, in order of increasing exponent == in order
of increasing absolute value.  This can lose information, but there is
no bad case for it, in part because there are typically so few
addends, and in part because that no two addends have the same binary
exponent implies massive cancellation can never occur.

Playing with this can show why most fp apps shouldn't worry most
often.  Example:

Get a million floats of about the same magnitude:

>>> xs = [random.random() for dummy in xrange(100)]

Sum them accurately:

>>> s = Summer()
>>> for x in xs:
... s.add(x)

No information has been lost yet (if you could look at your FPU's
"inexact result" flag, you'd see that it hasn't been set yet), and the
million inputs have been squashed into just a few dozen buckets:

>>> len(s.exp2sum)
22
>>> from pprint import pprint
>>> pprint(s.exp2sum)
{-20: 8.8332388070710977e-007,
 -19: 1.4206079529399673e-006,
 -16: 1.0065260162672729e-005,
 -15: 2.4398649189794064e-005,
 -14: 5.3980784313178987e-005,
 -10: 0.00074737138777436485,
 -9: 0.0014605198999595448,
 -8: 0.003361820812962546,
 -7: 0.0063811680318408559,
 -5: 0.016214300821313588,
 -4: 0.044836286041944229,
 -2: 0.17325355843673518,
 -1: 0.46194788522906305,
 3: 6.4590200674982423,
 4: 11.684394209886134,
 5: 24.715676913177944,
 6: 49.056084672323166,
 10: 767.69329043309051,
 11: 1531.1560084859361,
 13: 6155.484212371357,
 17: 98286.760386143636,
 19: 393290.34884990752}

Add those 22, smallest to largest:

>>> s.total()
500124.06621686369

Add the originals, "left to right":

>>> sum(xs)
500124.06621685845

So they're the same to about 14 significant decimal digits.  No good
if exact pennies are important, but far more precise than real-world
measurements.

How much does sorting first help?

>>> xs.sort()
>>> sum(xs)
500124.06621685764

Not much!  It actually moved a bit in the wrong direction (which is
unusual, but them's the breaks).

Using decimal as a (much more expensive) sanity check:

>>> import decimal
>>> t = decimal.Decimal(0)
>>> for x in xs:
... t += decimal.Decimal(repr(x))
>>> print t
500124.0662168636972127329455

Of course  Summer's result is very close to that.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-11 Thread Tim Peters
[Tim Peters]
...
> One clever way is to build on that whenever |x| and |y| are within a
> factor of 2 of each other, x+y is exact in 754 arithmetic.

Ack, I'm fried.  Forget that, it's wrong.  The correct statement is
that x-y is always exact whenever x and y are within a factor of two
of each other.  Summer.add() _can_ lose info -- it needs additional
trickery to make it loss-free, and because x+y can lose (at most one
bit) when x and y have the same binary exponent.

Exercise for the abused reader .
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] sum()

2005-03-12 Thread Raymond Hettinger
> [Tim Peters]
> Summer.add() _can_ lose info -- it needs additional
> trickery to make it loss-free, and because x+y can lose (at most one
> bit) when x and y have the same binary exponent.

Computing an error term can get the bit back and putting that term back
in the input queue restores the overall sum.  Once the inputs are
exhausted, the components of exp2sum should be exact.



from math import frexp
from itertools import chain

def summer2(iterable):
exp2sum = {}
queue = []
for x in chain(iterable, queue):
mant, exp = frexp(x)
while exp in exp2sum:
y = exp2sum.pop(exp)
z = x + y
d = x - z + y   # error term
assert z + d == x + y
if d:
queue.append(d)
x = z
mant, exp = frexp(x)
exp2sum[exp] = x
return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)


The implementation can be tweaked to consume the error term right away
so the queue won't grow by more than few pending items.  Also, the speed
can be boosted by localizing frexp, exp2sum.pop, and queue.append.


Raymond Hettinger
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-12 Thread Tim Peters
[Raymond Hettinger]
> Computing an error term can get the bit back and putting that term back
> in the input queue restores the overall sum.

Right!

>  Once the inputs are exhausted, the components of exp2sum should be exact.

Ditto.  I'll cover some subtleties below:

> from math import frexp
> from itertools import chain
>
> def summer2(iterable):
>exp2sum = {}
>queue = []
>for x in chain(iterable, queue):
>mant, exp = frexp(x)
>while exp in exp2sum:
>y = exp2sum.pop(exp)
>z = x + y
>d = x - z + y   # error term
>assert z + d == x + y

Much more is true there, but hard to assert in Python:  the
mathematical sum z+d is exactly equal to the mathematical sum x+y
there, and the floating-point |d| is less than 1 unit in the last
place relative to the floating-point |z|.  It would be clearer to name
"z" as "hi" and "d" as "lo".

More, that's not _generally_ true:  it relies on that x and y have the
same binary exponent.  For example, pick x=1 and y=1e100, and you end
up with hi=1e100 and lo=0.  The mathematical x+y isn't equal to the
mathematical hi+lo then.  It's a real weakness of "Kahan summation"
that most spellings of it don't bother to account for this; it's
sufficient to normalize first, swapping x and y if necessary so that
abs(x) >= abs(y) (although that isn't needed _here_ because we know
they have the same binary exponent).  There's another way to handle
the general case that doesn't require test, branch, or abs(), but at
the cost of several more addition/subtractions.
 
>if d:
>queue.append(d)

If x and y have different signs, this part doesn't trigger.  If all
inputs are positive, then we expect it to trigger about half the time
(the cases where exactly one of x's and y's least-significant bits is
set).  So the queue can be expected to grow to about half the length
of the original iterable by the time the original iterable is
exhausted.

>x = z
>mant, exp = frexp(x)
>exp2sum[exp] = x
>return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
>
> The implementation can be tweaked to consume the error term right away
> so the queue won't grow by more than few pending items.

Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic
and Fast Robust Geometric Predicates" gives the obvious  correct
way to do that, although as a practical matter it greatly benifits
from a bit more logic to eliminate zero entries as they're produced
(Shewchuk doesn't because it's not his goal to add a bunch of
same-precision floats).  BTW, extracting binary exponents isn't
necessary in his approach (largely specializations to "perfect" 754
arithmetic of Priest's earlier less-demanding methods).

>  Also, the speed can be boosted by localizing frexp, exp2sum.pop, and
> queue.append.

I'm very glad you quit while it was still interesting .

People who are paranoid about fp sums should use something like this. 
Even pairing is prone to disaster, given sufficiently nasty input. 
For example:

>>> xs = [1, 1e100, 1, -1e100] * 1
>>> sum(xs)  # and the obvious pairing method gives the same
0.0
>>> sum(sorted(xs)) # the result is nearly incomprehensible
-8.0076811737544552e+087
>>> sum(sorted(xs, reverse=True))
8.0076811737544552e+087
>>> summer2(xs)  # exactly right
2.0
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] sum()

2005-03-12 Thread Raymond Hettinger
> the queue can be expected to grow to about half the length
> of the original iterable by the time the original iterable is
> exhausted.
> 
> >x = z
> >mant, exp = frexp(x)
> >exp2sum[exp] = x
> >return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)
> >
> > The implementation can be tweaked to consume the error term right
away
> > so the queue won't grow by more than few pending items.
> 
> Theorem 10 in Shewchuk's "Adaptive Precision Floating-Point Arithmetic
> and Fast Robust Geometric Predicates" gives the obvious  correct
> way to do that, although as a practical matter it greatly benifits
> from a bit more logic to eliminate zero entries as they're produced
> (Shewchuk doesn't because it's not his goal to add a bunch of
> same-precision floats).  BTW, extracting binary exponents isn't
> necessary in his approach (largely specializations to "perfect" 754
> arithmetic of Priest's earlier less-demanding methods).

The approach I'm using relies on being able to exactly multiply the 0 or
1 bit error term mantissas by an integer (a count of how many times the
term occurs).  With a Python dictionary keeping the counts, many steps
are saved and the tool becomes much more memory friendly than with the
previous queue approach.


from math import frexp

def summer3(iterable):
exp2sum = {}# map to a value with a given exponent
errdict = {}# map error terms to an occurrence count


def addone(x, exppop=exp2sum.pop, errget=errdict.get, frexp=frexp):
mant, exp = frexp(x)# extract exponent from float
representation
while exp in exp2sum:
y = exppop(exp) # pair with a value having the same exp
z = x + y   # sum may be inexact by less than 1 ulp
of z
d = x - z + y   # difference is the error term
assert z + d == x + y   # sum plus error term is
exact!
errdict[d] = errget(d, 0) + 1   # count occurrences of each
term
x = z   # continue with new sum
mant, exp = frexp(x)
exp2sum[exp] = x

for x in iterable:
addone(x)

while errdict:
d, q = errdict.popitem()
assert frexp(d)[0] in [-0.5, 0.0, 0.5]  # error terms are 1 bit
# product of 1 bit error term with an integer count is exact

addone(d * q)
dummy = errdict.pop(0.0, None)

# At this point, the errdict is empty, all partial sums are exact,
# and each partial sum has a different exponent.  They can now be
# added smallest absolute value to largest and only lose bits that
# cannot fit in the final result.  IOW, the final result is accurate
# to full precision (assuming no representation error in the
inputs).
return sum(sorted(exp2sum.itervalues(), key=abs), 0.0)


Aside from being a nice recipe, this was a good exercise in seeing what
pure Python can do with IEEE-754 guarantees.  The modest memory
consumption and the typical O(n) runtime are a nice plus (no pun
intended).



Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-12 Thread David Ascher
You guys have way too much time on your hands and neurons devoted to
this stuff. I'm glad that means that I can spend the afternoon playing
w/ my kids and searching python-dev when I need to add numbers =).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum()

2005-03-12 Thread Tim Peters
[Raymond Hettinger]
> The approach I'm using relies on being able to exactly multiply the 0 or
> 1 bit error term mantissas by an integer (a count of how many times the
> term occurs).  With a Python dictionary keeping the counts, many steps
> are saved and the tool becomes much more memory friendly than with the
> previous queue approach.

Cool!  That's a nice approach.

For contrast, here's one that doesn't use frexp(), and is probably
faster because of that; internally, len(sums) probably won't exceed 5
in real life (but could get as large as 2K for pathological inputs,
spanning fp's full dynamic range):

def summer4(iterable):
sums = [0.0]
for x in iterable:
sums.append(x)
for i in xrange(len(sums)-2, -1, -1):
y = sums[i]
if abs(x) < abs(y):
x, y = y, x
hi = x+y
lo = y - (hi - x)
if lo:
sums[i+1] = lo
else:
del sums[i+1]
x = hi
sums[0] = x
return sum(reversed(sums), 0.0)

In effect, that keeps an arbitrarily long list of "error terms" so
that no info is lost before the final sum().  I think it's surprising
at first how short that list usually is.

> ...
> Aside from being a nice recipe, this was a good exercise in seeing what
> pure Python can do with IEEE-754 guarantees.

Now you know I how feel everytime sometime proclaims that fp
arithmetic is inherently unreliable <0.3 wink>.  Learning how to use
it correctly is indeed the blackest of obscure arts, though.

> The modest memory consumption and the typical O(n) runtime are a nice
> plus (no pun intended).

Yup, they're all emminently practical if you don't care too much about
speed.  The one above is the fastest I tried, and it still takes some
40x longer than plain sum() over a million-element random.random()
list.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] sum()

2005-03-28 Thread Raymond Hettinger
[Tim]
> For contrast, here's one that doesn't use frexp(), and is probably
> faster because of that; internally, len(sums) probably won't exceed 5
> in real life (but could get as large as 2K for pathological inputs,
> spanning fp's full dynamic range):
> 
> def summer4(iterable):
> sums = [0.0]
> for x in iterable:
> sums.append(x)
> for i in xrange(len(sums)-2, -1, -1):
> y = sums[i]
> if abs(x) < abs(y):
> x, y = y, x
> hi = x+y
> lo = y - (hi - x)
> if lo:
> sums[i+1] = lo
> else:
> del sums[i+1]
> x = hi
> sums[0] = x
> return sum(reversed(sums), 0.0)


Here's a rewrite with the partial sums ordered by increasing magnitude.
A cursor is used to write-out new, non-zero terms.  That makes the list
growth or shrinkage occur at the end of the list and it avoids reversed
indexing.  Those two changes made the code easier for me to follow.

Leaving off the 0.0 makes the routine generic.  It works equally well
with int, long, Decimal, float, and complex.


def summer5(iterable, start=0):
"Binary or Decimal floating point summation accurate to full
precision."
partials = []   # sorted, non-overlapping partial sums
for x in iterable:
i = 0   # cursor for writing-out new partials sums
for y in partials:
if abs(x) < abs(y):
x, y = y, x
hi = x + y
lo = y - (hi - x)
if lo:
partials[i] = lo
i += 1
x = hi
partials[i:] = [x]
return sum(partials, start)



The loop invariants for the list of partial sums are:

assert partials == sorted(partials, key=abs)
assert nonoverlapping(partials)

where a rough check for overlaps is:

def nonoverlapping(seqn,offset=100):
"""Verify that sequence elements have no overlapping bits.

Set offset to -Emin to handle the full range of floats.
"""
cumv = 0L
for elem in seqn:
m, exp = frexp(abs(elem))
v = int(m * 2 ** 53) * 2L ** (exp + offset)
if cumv & v:
return False
cumv |= v
return True


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-01 Thread Guido van Rossum
No. We just can't put all possible use cases in the docstring. :-)


On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:

> help(sum) tells clearly that it should be used to sum numbers and not
> strings, and with strings actually fails.
>
> However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
>
> Is this to be considered a bug?
>
> Andrea
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-01 Thread Allen Li
On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:
> No. We just can't put all possible use cases in the docstring. :-)
> 
> 
> On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:
> 
> help(sum) tells clearly that it should be used to sum numbers and not
> strings, and with strings actually fails.
> 
> However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
> 
> Is this to be considered a bug?

Can you explain the rationale behind this design decision?  It seems
terribly inconsistent.  Why are only strings explicitly restricted from
being sum()ed?  sum() should either ban everything except numbers or
accept everything that implements addition (duck typing).


pgpIYO6TwDe04.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-01 Thread Terry Reedy

On 8/2/2014 1:57 AM, Allen Li wrote:

On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:

No. We just can't put all possible use cases in the docstring. :-)


On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:

 help(sum) tells clearly that it should be used to sum numbers and not
 strings, and with strings actually fails.

 However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.

 Is this to be considered a bug?


Can you explain the rationale behind this design decision?  It seems
terribly inconsistent.  Why are only strings explicitly restricted from
being sum()ed?  sum() should either ban everything except numbers or
accept everything that implements addition (duck typing).


O(n**2) behavior, ''.join(strings) alternative.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Steven D'Aprano
On Fri, Aug 01, 2014 at 10:57:38PM -0700, Allen Li wrote:
> On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:
> > No. We just can't put all possible use cases in the docstring. :-)
> > 
> > 
> > On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:
> > 
> > help(sum) tells clearly that it should be used to sum numbers and not
> > strings, and with strings actually fails.
> > 
> > However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
> > 
> > Is this to be considered a bug?
> 
> Can you explain the rationale behind this design decision?  It seems
> terribly inconsistent.  Why are only strings explicitly restricted from
> being sum()ed?  sum() should either ban everything except numbers or
> accept everything that implements addition (duck typing).

Repeated list and str concatenation both have quadratic O(N**2) 
performance, but people frequently build up strings with + and rarely do 
the same for lists. String concatenation with + is an attractive 
nuisance for many people, including some who actually know better but 
nevertheless do it. Also, for reasons I don't understand, many people 
dislike or cannot remember to use ''.join.

Whatever the reason, repeated string concatenation is common whereas 
repeated list concatenation is much, much rarer (and repeated tuple 
concatenation even rarer), so sum(strings) is likely to be a land mine 
buried in your code while sum(lists) is not. Hence the decision that 
beginners in particular need to be protected from the mistake of using 
sum(strings) but bothering to check for sum(lists) is a waste of time.

Personally, I wish that sum would raise a warning rather than an 
exception.

As for prohibiting anything except numbers with sum(), that in my 
opinion would be a bad idea. sum(vectors), sum(numeric_arrays), 
sum(angles) etc. should all be allowed. The general sum() built-in 
should accept any type that allows + (unless explicitly black-listed), 
while specialist numeric-only sums could go into modules (like 
math.fsum).



-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Julian Taylor
On 02.08.2014 08:35, Terry Reedy wrote:
> On 8/2/2014 1:57 AM, Allen Li wrote:
>> On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:
>>> No. We just can't put all possible use cases in the docstring. :-)
>>>
>>>
>>> On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:
>>>
>>>  help(sum) tells clearly that it should be used to sum numbers
>>> and not
>>>  strings, and with strings actually fails.
>>>
>>>  However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
>>>
>>>  Is this to be considered a bug?
>>
>> Can you explain the rationale behind this design decision?  It seems
>> terribly inconsistent.  Why are only strings explicitly restricted from
>> being sum()ed?  sum() should either ban everything except numbers or
>> accept everything that implements addition (duck typing).
> 
> O(n**2) behavior, ''.join(strings) alternative.
> 
> 

hm could this be a pure python case that would profit from temporary
elision [0]?

lists could declare the tp_can_elide slot and call list.extend on the
temporary during its tp_add slot instead of creating a new temporary.
extend/realloc can avoid the copy if there is free memory available
after the block.

[0] https://mail.python.org/pipermail/python-dev/2014-June/134826.html
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Stefan Behnel
Julian Taylor schrieb am 02.08.2014 um 12:11:
> On 02.08.2014 08:35, Terry Reedy wrote:
>> On 8/2/2014 1:57 AM, Allen Li wrote:
>>> On Fri, Aug 01, 2014 at 02:51:54PM -0700, Guido van Rossum wrote:
 No. We just can't put all possible use cases in the docstring. :-)


 On Fri, Aug 1, 2014 at 2:48 PM, Andrea Griffini  wrote:

  help(sum) tells clearly that it should be used to sum numbers
 and not
  strings, and with strings actually fails.

  However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.

  Is this to be considered a bug?
>>>
>>> Can you explain the rationale behind this design decision?  It seems
>>> terribly inconsistent.  Why are only strings explicitly restricted from
>>> being sum()ed?  sum() should either ban everything except numbers or
>>> accept everything that implements addition (duck typing).
>>
>> O(n**2) behavior, ''.join(strings) alternative.
> 
> lists could declare the tp_can_elide slot and call list.extend on the
> temporary during its tp_add slot instead of creating a new temporary.
> extend/realloc can avoid the copy if there is free memory available
> after the block.

Yes, i.e. only sometimes. Better not rely on it in your code.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Alexander Belopolsky
On Sat, Aug 2, 2014 at 3:39 AM, Steven D'Aprano  wrote:

> String concatenation with + is an attractive
> nuisance for many people, including some who actually know better but
> nevertheless do it. Also, for reasons I don't understand, many people
> dislike or cannot remember to use ''.join.
>

Since sum() already treats strings as a special case, why can't it simply
call (an equivalent of) ''.join itself instead of telling the user to do
it?  It does not matter why "many people dislike or cannot remember to use
''.join" - if this is a fact - it should be considered by language
implementors.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Stefan Behnel
Alexander Belopolsky schrieb am 02.08.2014 um 16:52:
> On Sat, Aug 2, 2014 at 3:39 AM, Steven D'Aprano wrote:
> 
>> String concatenation with + is an attractive
>> nuisance for many people, including some who actually know better but
>> nevertheless do it. Also, for reasons I don't understand, many people
>> dislike or cannot remember to use ''.join.
> 
> Since sum() already treats strings as a special case, why can't it simply
> call (an equivalent of) ''.join itself instead of telling the user to do
> it?  It does not matter why "many people dislike or cannot remember to use
> ''.join" - if this is a fact - it should be considered by language
> implementors.

I don't think sum(strings) is beautiful enough to merit special cased
support. Special cased rejection sounds like a much better way to ask
people "think again - what's a sum of strings anyway?".

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Steven D'Aprano
On Sat, Aug 02, 2014 at 10:52:07AM -0400, Alexander Belopolsky wrote:
> On Sat, Aug 2, 2014 at 3:39 AM, Steven D'Aprano  wrote:
> 
> > String concatenation with + is an attractive
> > nuisance for many people, including some who actually know better but
> > nevertheless do it. Also, for reasons I don't understand, many people
> > dislike or cannot remember to use ''.join.
> >
> 
> Since sum() already treats strings as a special case, why can't it simply
> call (an equivalent of) ''.join itself instead of telling the user to do
> it?  It does not matter why "many people dislike or cannot remember to use
> ''.join" - if this is a fact - it should be considered by language
> implementors.

It could, of course, but there is virtue in keeping sum simple, 
rather than special-casing who knows how many different types. If sum() 
tries to handle strings, should it do the same for lists? bytearrays? 
array.array? tuple? Where do we stop?

Ultimately it comes down to personal taste. Some people are going to 
wish sum() tried harder to do the clever thing with more types, some 
people are going to wish it was simpler and didn't try to be clever at 
all.

Another argument against excessive cleverness is that it ties sum() to 
one particular idiom or implementation. Today, the idiomatic and 
efficient way to concatenate a lot of strings is with ''.join, but 
tomorrow there might be a new str.concat() method. Who knows? sum() 
shouldn't have to care about these details, since they are secondary to 
sum()'s purpose, which is to add numbers. Anything else is a 
bonus (or perhaps a nuisance).

So, I would argue that when faced with something that is not a number, 
there are two reasonable approaches for sum() to take:

- refuse to handle the type at all; or
- fall back on simple-minded repeated addition.


By the way, I think this whole argument would have been easily 
side-stepped if + was only used for addition, and & used for 
concatenation. Then there would be no question about what sum() should 
do for lists and tuples and strings: raise TypeError.



-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread MRAB

On 2014-08-02 16:27, Steven D'Aprano wrote:

On Sat, Aug 02, 2014 at 10:52:07AM -0400, Alexander Belopolsky wrote:

On Sat, Aug 2, 2014 at 3:39 AM, Steven D'Aprano  wrote:

> String concatenation with + is an attractive
> nuisance for many people, including some who actually know better but
> nevertheless do it. Also, for reasons I don't understand, many people
> dislike or cannot remember to use ''.join.
>

Since sum() already treats strings as a special case, why can't it simply
call (an equivalent of) ''.join itself instead of telling the user to do
it?  It does not matter why "many people dislike or cannot remember to use
''.join" - if this is a fact - it should be considered by language
implementors.


It could, of course, but there is virtue in keeping sum simple,
rather than special-casing who knows how many different types. If sum()
tries to handle strings, should it do the same for lists? bytearrays?
array.array? tuple? Where do we stop?


We could leave any special-casing to the classes themselves:

def sum(iterable, start=0):
sum_func = getattr(type(start), '__sum__')

if sum_func is None:
result = start

for item in iterable:
result = result + item
else:
result = sum_func(start, iterable)

return result


Ultimately it comes down to personal taste. Some people are going to
wish sum() tried harder to do the clever thing with more types, some
people are going to wish it was simpler and didn't try to be clever at
all.

Another argument against excessive cleverness is that it ties sum() to
one particular idiom or implementation. Today, the idiomatic and
efficient way to concatenate a lot of strings is with ''.join, but
tomorrow there might be a new str.concat() method. Who knows? sum()
shouldn't have to care about these details, since they are secondary to
sum()'s purpose, which is to add numbers. Anything else is a
bonus (or perhaps a nuisance).

So, I would argue that when faced with something that is not a number,
there are two reasonable approaches for sum() to take:

- refuse to handle the type at all; or
- fall back on simple-minded repeated addition.


By the way, I think this whole argument would have been easily
side-stepped if + was only used for addition, and & used for
concatenation. Then there would be no question about what sum() should
do for lists and tuples and strings: raise TypeError.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread Alexander Belopolsky
On Sat, Aug 2, 2014 at 11:06 AM, Stefan Behnel  wrote:

> I don't think sum(strings) is beautiful enough


sum(strings) is more beautiful than ''.join(strings) in my view, but
unfortunately it does not work even for lists because the initial value
defaults to 0.

sum(strings, '') and ''.join(strings) are equally ugly and non-obvious
because they require an empty string.  Empty containers are an advanced
concept and it is unfortunate that a simple job of concatenating a list of
(non-empty!) strings exposes the user to it.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-02 Thread David Wilson
On Sat, Aug 02, 2014 at 05:39:12PM +1000, Steven D'Aprano wrote:

> Repeated list and str concatenation both have quadratic O(N**2)
> performance, but people frequently build up strings with + and rarely
> do the same for lists. String concatenation with + is an attractive
> nuisance for many people, including some who actually know better but
> nevertheless do it. Also, for reasons I don't understand, many people
> dislike or cannot remember to use ''.join.

join() isn't preferable in cases where it damages readability while
simultaneously providing zero or negative performance benefit, such as
when concatenating a few short strings, e.g. while adding a prefix to a
filename.

Although it's true that join() is automatically the safer option, and
especially when dealing with user supplied data, the net harm caused by
teaching rote and ceremony seems far less desirable compared to fixing a
trivial slowdown in a script, if that slowdown ever became apparent.

Another (twisted) interpretation is that since the quadratic behaviour
is a CPython implementation detail, and there are alternatives where
__add__ is constant time, encouraging users to code against
implementation details becomes undesirable. In our twisty world, __add__
becomes *preferable* since the resulting programs more closely resemble
pseudo-code.

$ cat t.py
a = 'this '
b = 'is a string'
c = 'as we can tell'

def x():
return a + b + c

def y():
return ''.join([a, b, c])

$ python -m timeit -s 'import t' 't.x()'
100 loops, best of 3: 0.477 usec per loop

$ python -m timeit -s 'import t' 't.y()'
100 loops, best of 3: 0.695 usec per loop


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-04 Thread Chris Barker
On Sat, Aug 2, 2014 at 1:35 PM, David Wilson  wrote:

> > Repeated list and str concatenation both have quadratic O(N**2)
> > performance, but people frequently build up strings with +
>


> join() isn't preferable in cases where it damages readability while
> simultaneously providing zero or negative performance benefit, such as
> when concatenating a few short strings, e.g. while adding a prefix to a
> filename.
>

Good point -- I was trying to make the point about .join() vs + for strings
in an intro python class last year, and made the mistake of having the
students test the performance.

You need to concatenate a LOT of strings to see any difference at all --  I
know that O() of algorithms is unavoidable, but between efficient python
optimizations and a an apparently good memory allocator, it's really a
practical non-issue.


> Although it's true that join() is automatically the safer option, and
> especially when dealing with user supplied data, the net harm caused by
> teaching rote and ceremony seems far less desirable compared to fixing a
> trivial slowdown in a script, if that slowdown ever became apparent.
>

and it rarely would.

Blocking sum( some_strings) because it _might_ have poor performance seems
awfully pedantic.

As a long-time numpy user, I think sum(a_long_list_of_numbers) has
pathetically bad performance, but I wouldn't block it!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-04 Thread Steven D'Aprano
On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:

> Good point -- I was trying to make the point about .join() vs + for strings
> in an intro python class last year, and made the mistake of having the
> students test the performance.
> 
> You need to concatenate a LOT of strings to see any difference at all --  I
> know that O() of algorithms is unavoidable, but between efficient python
> optimizations and a an apparently good memory allocator, it's really a
> practical non-issue.

If only that were the case, but it isn't. Here's a cautionary tale for 
how using string concatenation can blow up in your face:

Chris Withers asks for help debugging HTTP slowness:
https://mail.python.org/pipermail/python-dev/2009-August/091125.html

and publishes some times:
https://mail.python.org/pipermail/python-dev/2009-September/091581.html

(notice that Python was SIX HUNDRED times slower than wget or IE)

and Simon Cross identified the problem:
https://mail.python.org/pipermail/python-dev/2009-September/091582.html

leading Guido to describe the offending code as an embarrassment.

It shouldn't be hard to demonstrate the difference between repeated 
string concatenation and join, all you need do is defeat sum()'s 
prohibition against strings. Run this bit of code, and you'll see a 
significant difference in performance, even with CPython's optimized 
concatenation:

# --- cut ---
class Faker:
def __add__(self, other):
return other

x = Faker()
strings = list("Hello World!")
assert ''.join(strings) == sum(strings, x)

from timeit import Timer
setup = "from __main__ import x, strings"
t1 = Timer("''.join(strings)", setup)
t2 = Timer("sum(strings, x)", setup)

print (min(t1.repeat()))
print (min(t2.repeat()))
# --- cut ---


On my computer, using Python 2.7, I find the version using sum is nearly 
4.5 times slower, and with 3.3 about 4.2 times slower. That's with a 
mere twelve substrings, hardly "a lot". I tried running it on IronPython 
with a slightly larger list of substrings, but I got sick of waiting for 
it to finish.

If you want to argue that microbenchmarks aren't important, well, I 
might agree with you in general, but in the specific case of string 
concatenation there's that pesky factor of 600 slowdown in real world 
code to argue with.


> Blocking sum( some_strings) because it _might_ have poor performance seems
> awfully pedantic.

The rationale for explicitly prohibiting strings while merely implicitly 
discouraging other non-numeric types is that beginners, who are least 
likely to understand why their code occasionally and unpredictably 
becomes catastrophically slow, are far more likely to sum strings than 
sum tuples or lists.

(I don't entirely agree with this rationale, I'd prefer a warning rather 
than an exception.)



-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-04 Thread Stefan Behnel
Steven D'Aprano schrieb am 04.08.2014 um 20:10:
> On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:
> 
>> Good point -- I was trying to make the point about .join() vs + for strings
>> in an intro python class last year, and made the mistake of having the
>> students test the performance.
>>
>> You need to concatenate a LOT of strings to see any difference at all --  I
>> know that O() of algorithms is unavoidable, but between efficient python
>> optimizations and a an apparently good memory allocator, it's really a
>> practical non-issue.
> 
> If only that were the case, but it isn't. Here's a cautionary tale for 
> how using string concatenation can blow up in your face:
> 
> Chris Withers asks for help debugging HTTP slowness:
> https://mail.python.org/pipermail/python-dev/2009-August/091125.html
> 
> and publishes some times:
> https://mail.python.org/pipermail/python-dev/2009-September/091581.html
> 
> (notice that Python was SIX HUNDRED times slower than wget or IE)
> 
> and Simon Cross identified the problem:
> https://mail.python.org/pipermail/python-dev/2009-September/091582.html
> 
> leading Guido to describe the offending code as an embarrassment.

Thanks for digging up that story.


>> Blocking sum( some_strings) because it _might_ have poor performance seems
>> awfully pedantic.
> 
> The rationale for explicitly prohibiting strings while merely implicitly 
> discouraging other non-numeric types is that beginners, who are least 
> likely to understand why their code occasionally and unpredictably 
> becomes catastrophically slow, are far more likely to sum strings than 
> sum tuples or lists.

Well, the obvious difference between strings and lists (not tuples) is that
strings are immutable, so it would seem more obvious at first sight to
concatenate strings than to do the same thing with lists, which can easily
be extended (they are clearly designed for that). This rational may not
apply as much to beginners as to more experienced programmers, but it
should still explain why this is so often discussed in the context of
string concatenation and pretty much never for lists.

As for tuples, their most common use case is to represent a fixed length
sequence of semantically different values. That renders their concatenation
a sufficiently uncommon use case to make no-one ask loudly for "large
scale" sum(tuples) support.

Basically, extending lists is an obvious thing, but getting multiple
strings joined without using "+"-concatenating them isn't.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-04 Thread Jim J. Jewett



Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in
https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote:


> Andrea Griffini  wrote:

>>However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.

> hm could this be a pure python case that would profit from temporary
> elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html 
> ]?

> lists could declare the tp_can_elide slot and call list.extend on the
> temporary during its tp_add slot instead of creating a new temporary.
> extend/realloc can avoid the copy if there is free memory available
> after the block.

Yes, with all the same problems.

When dealing with a complex object, how can you be sure that __add__
won't need access to the original values during the entire computation?
It works with matrix addition, but not with matric multiplication.
Depending on the details of the implementation, it could even fail for
a sort of sliding-neighbor addition similar to the original justification.

Of course, then those tricky implementations should not define an
_eliding_add_, but maybe the builtin objects still should?  After all,
a plain old list is OK to re-use.  Unless the first evaluation to create
it ends up evaluating an item that has side effects...

In the end, it looks like a lot of machinery (and extra checks that may
slow down the normal small-object case) for something that won't be used
all that often.

Though it is really tempting to consider a compilation mode that assumes
objects and builtins will be "normal", and lets you replace the entire
above expression with compile-time [1, 2, 3, 4, 5, 6].  Would writing
objects to that stricter standard and encouraging its use (and maybe
offering a few AST transforms to auto-generate the out-parameters?) work
as well for those who do need the speed?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Chris Barker
On Mon, Aug 4, 2014 at 11:10 AM, Steven D'Aprano 
wrote:

> On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:
>
> > Good point -- I was trying to make the point about .join() vs + for
> strings
> > in an intro python class last year, and made the mistake of having the
> > students test the performance.
> >
> > You need to concatenate a LOT of strings to see any difference at all
>


> If only that were the case, but it isn't. Here's a cautionary tale for
> how using string concatenation can blow up in your face:
>
> Chris Withers asks for help debugging HTTP slowness:
> https://mail.python.org/pipermail/python-dev/2009-August/091125.html



Thanks for that -- interesting story. note that that was not suing sum() in
that case though, which is really the issue at hand.

It shouldn't be hard to demonstrate the difference between repeated
> string concatenation and join, all you need do is defeat sum()'s
> prohibition against strings. Run this bit of code, and you'll see a
> significant difference in performance, even with CPython's optimized
> concatenation:
>

well, that does look compelling, but what it shows is that
sum(a_list_of_strings) is slow compared to ''.join(a_list_of_stings). That
doesn't surprise me a bit -- this is really similar to why:

a_numpy_array.sum()

is going to be a lot faster than:

sum(a_numpy_array)

and why I'll tell everyone that is working with lots of numbers to use
numpy. ndarray.sum know what data type it's deaing with,a nd can do the
loop in C. similarly with ''.join() (though not as optimized.

But I'm not sure we're seeing the big O difference here at all -- but
rather the extra calls though each element in the list's __add__ method.

In the case where you already HAVE a big list of strings, then yes, ''.join
is the clear winner.

But I think the case we're often talking about, and I've tested with
students, is when you are building up a long string on the fly out of
little strings. In that case, you need to profile the full "append to list,
then call join()", not just the join() call:

# continued adding of strings ( O(n^2)? )
In [6]: def add_strings(l):
   ...: s = ''
   ...: for i in l:
   ...: s+=i
   ...: return s

Using append and then join ( O(n)? )
In [14]: def join_strings(list_of_strings):
   : l = []
   : for i in list_of_strings:
   : l.append(i)
   : return ''.join(l)

In [23]: timeit add_strings(strings)
100 loops, best of 3: 831 ns per loop

In [24]: timeit join_strings(strings)
10 loops, best of 3: 1.87 µs per loop

## hmm -- concatenating is faster for a small list of tiny strings

In [31]: strings = list('Hello World')* 1000

strings *= 1000
In [26]: timeit add_strings(strings)
1000 loops, best of 3: 932 µs per loop

In [27]: timeit join_strings(strings)
1000 loops, best of 3: 967 µs per loop

## now about the same.

In [31]: strings = list('Hello World')* 1

In [29]: timeit add_strings(strings)
100 loops, best of 3: 9.44 ms per loop

In [30]: timeit join_strings(strings)
100 loops, best of 3: 10.1 ms per loop

still about he same?

In [31]: strings = list('Hello World')* 100

In [32]: timeit add_strings(strings)
1 loops, best of 3: 1.27 s per loop

In [33]: timeit join_strings(strings)
1 loops, best of 3: 1.05 s per loop

there we go -- slight advantage to joining.

So this is why we've said that the common wisdom about string concatenating
isn't really a practical issue.

But if you already have the strings all in a list, then yes, join() is a
major win over sum()

In fact, I tried the above with sum() -- and it was really, really slow. So
slow I didn't have the patience to wait for it.

Here is a smaller example:

In [22]: strings = list('Hello World')* 1

In [23]: timeit add_strings(strings)
100 loops, best of 3: 9.61 ms per loop

In [24]: timeit sum( strings, Faker() )
1 loops, best of 3: 246 ms per loop

So why is sum() so darn slow with strings compared to a simple loop with +=
?

(and if I try it with a list 10 times as long it takes "forever")

Perhaps the http issue cited was before some nifty optimizations in current
CPython?

-Chris





-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 03:06 PM, Chris Barker wrote:

[snip timings, etc.]

I don't remember where, but I believe that cPython has an optimization built in for repeated string concatenation, which 
is probably why you aren't seeing big differences between the + and the sum().


A little testing shows how to defeat that optimization:

  blah = ''
  for string in ['booyah'] * 10:
  blah = string + blah

Note the reversed order of the addition.

--> timeit.Timer("for string in ['booya'] * 10: blah = blah + string", "blah = 
''").repeat(3, 1)
[0.021117210388183594, 0.013692855834960938, 0.00768280029296875]

--> timeit.Timer("for string in ['booya'] * 10: blah = string + blah", "blah = 
''").repeat(3, 1)
[15.301048994064331, 15.343288898468018, 15.268463850021362]

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 04:01 PM, Ethan Furman wrote:

On 08/07/2014 03:06 PM, Chris Barker wrote:

 the + and the sum().


Yeah, that 'sum' should be 'join'  :/

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 04:01 PM, Ethan Furman wrote:

On 08/07/2014 03:06 PM, Chris Barker wrote:

--> timeit.Timer("for string in ['booya'] * 10: blah = blah + string", "blah = 
''").repeat(3, 1)
[0.021117210388183594, 0.013692855834960938, 0.00768280029296875]

--> timeit.Timer("for string in ['booya'] * 10: blah = string + blah", "blah = 
''").repeat(3, 1)
[15.301048994064331, 15.343288898468018, 15.268463850021362]


Oh, and the join() timings:

--> timeit.Timer("blah = ''.join(['booya'] * 10)", "blah = ''").repeat(3, 1)
[0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188]

So, + is three orders of magnitude slower than join.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Chris Barker
On Thu, Aug 7, 2014 at 4:01 PM, Ethan Furman  wrote:

> I don't remember where, but I believe that cPython has an optimization
> built in for repeated string concatenation, which is probably why you
> aren't seeing big differences between the + and the sum().
>

Indeed -- clearly so.

A little testing shows how to defeat that optimization:

  blah = ''
>   for string in ['booyah'] * 10:
>   blah = string + blah
>
> Note the reversed order of the addition.
>

thanks -- cool trick.

Oh, and the join() timings:
> --> timeit.Timer("blah = ''.join(['booya'] * 10)", "blah =
> ''").repeat(3, 1)
> [0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188]
> So, + is three orders of magnitude slower than join.


only one if if you use the optimized form of + and not even that if you
need to build up the list first, which is the common use-case.

So my final question is this:

repeated string concatenation is not the "recommended" way to do this --
but nevertheless, cPython has an optimization that makes it fast and
efficient, to the point that there is no practical performance reason to
prefer appending to a list and calling join()) afterward.

So why not apply a similar optimization to sum() for strings?

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Ethan Furman

On 08/08/2014 08:23 AM, Chris Barker wrote:


So my final question is this:

repeated string concatenation is not the "recommended" way to do this -- but 
nevertheless, cPython has an optimization
that makes it fast and efficient, to the point that there is no practical 
performance reason to prefer appending to a
list and calling join()) afterward.

So why not apply a similar optimization to sum() for strings?


That I cannot answer -- I find the current situation with sum highly irritating.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Raymond Hettinger

On Aug 8, 2014, at 11:09 AM, Ethan Furman  wrote:

>> So why not apply a similar optimization to sum() for strings?
> 
> That I cannot answer -- I find the current situation with sum highly 
> irritating.
> 

It is only irritating if you are misusing sum().

The str.__add__ optimization was put in because
it was common for people to accidentally incur
the performance penalty.

With sum(), we don't seem to have that problem
(I don't see people using it to add lists except
just to show that could be done).


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Ethan Furman

On 08/08/2014 05:34 PM, Raymond Hettinger wrote:


On Aug 8, 2014, at 11:09 AM, Ethan Furman mailto:et...@stoneleaf.us>> wrote:


So why not apply a similar optimization to sum() for strings?


That I cannot answer -- I find the current situation with sum highly irritating.



It is only irritating if you are misusing sum().


Actually, I have an advanced degree in irritability -- perhaps you've noticed 
in the past?

I don't use sum at all, or at least very rarely, and it still irritates me.  It feels like I'm being told I'm too dumb 
to figure out when I can safely use sum and when I can't.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Alexander Belopolsky
On Fri, Aug 8, 2014 at 8:56 PM, Ethan Furman  wrote:

> I don't use sum at all, or at least very rarely, and it still irritates me.


You are not alone.  When I see sum([a, b, c]), I think it is a + b + c, but
in Python it is 0 + a + b + c.  If we had a "join" operator for strings
that is different form + - then sure, I would not try to use sum to join
strings, but we don't.  I have always thought that sum(x) is just a
shorthand for reduce(operator.add, x), but again it is not so in Python.
 While "sum should only be used for numbers,"  it turns out it is not a
good choice for floats - use math.fsum.  While "strings are blocked because
sum is slow," numpy arrays with millions of elements are not.  And try to
explain to someone that sum(x) is bad on a numpy array, but abs(x) is fine.
 Why have builtin sum at all if its use comes with so many caveats?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Steven D'Aprano
On Fri, Aug 08, 2014 at 10:20:37PM -0400, Alexander Belopolsky wrote:
> On Fri, Aug 8, 2014 at 8:56 PM, Ethan Furman  wrote:
> 
> > I don't use sum at all, or at least very rarely, and it still irritates me.
> 
> 
> You are not alone.  When I see sum([a, b, c]), I think it is a + b + c, but
> in Python it is 0 + a + b + c.  If we had a "join" operator for strings
> that is different form + - then sure, I would not try to use sum to join
> strings, but we don't.

I've long believed that + is the wrong operator for concatenating 
strings, and that & makes a much better operator. We wouldn't be having 
these interminable arguments about using sum() to concatenate strings 
(and lists, and tuples) if the & operator was used for concatenation and 
+ was only used for numeric addition.


> I have always thought that sum(x) is just a
> shorthand for reduce(operator.add, x), but again it is not so in Python.

The signature of reduce is:

reduce(...)
reduce(function, sequence[, initial]) -> value

so sum() is (at least conceptually) a shorthand for reduce:

def sum(values, initial=0):
return reduce(operator.add, values, initial)

but that's an implementation detail, not a language promise, and sum() 
is free to differ from that simple version. Indeed, even the public 
interface is different, since sum() prohibits using a string as the 
initial value and only promises to work with numbers. The fact that it 
happens to work with lists and tuples is somewhat of an accident of 
implementation.


> While "sum should only be used for numbers,"  it turns out it is not a
> good choice for floats - use math.fsum.

Correct. And if you (generic you, not you personally) do not understand 
why simple-minded addition of floats is troublesome, then you're going 
to have a world of trouble. Anyone who is disturbed by the question of 
"should I use sum or math.fsum?" probably shouldn't be writing serious 
floating point code at all. Floating point computations are hard, and 
there is simply no escaping this fact.


> While "strings are blocked because
> sum is slow," numpy arrays with millions of elements are not.

That's not a good example. Strings are potentially O(N**2), which means 
not just "slow" but *agonisingly* slow, as in taking a week -- no 
exaggeration -- to concat a million strings. If it takes a nanosecond to 
concat two strings, then 1e6**2 such concatenations could take over 
eleven days. Slowness of such magnitude might as well be "the process 
has locked up".

In comparison, summing a numpy array with a million entries is not 
really slow in that sense. The time taken is proportional to the number 
of entries, and differs from summing a list only by a constant factor.

Besides, in the case of strings it is quite simple to decide "is the 
initial value a string?", whereas with lists or numpy arrays it's quite 
hard to decide "is the list or array so huge that the user will consider 
this too slow?". What counts as "too slow" depends on the machine it is 
running on, what other processes are running, and the user's mood, and 
leads to the silly result that summing an array of N items succeeds but 
N+1 items doesn't. So in the case of strings, it is easy to make a
blanket prohibition, but in the case of lists or arrays, there is no 
reasonable place to draw the line.


> And try to
> explain to someone that sum(x) is bad on a numpy array, but abs(x) is fine.

I think that's because sum() has to box up each and every element in the 
array into an object, which is wasteful, while abs() can delegate to a 
specialist array.__abs__ method. Although that's not something beginners 
should be expected to understand, no serious Python programmer should be 
confused by this. As a programmer, we should expect to have some 
understanding of our tools, how they work, their limitations, and when 
to use a different tool. That's why numpy has its own version of sum 
which is designed to work specifically on numpy arrays. Use a specialist 
tool for a specialist job:

py> with Stopwatch():
... sum(carray)  # carray is a numpy array of 7500 floats.
...
11250.0
time taken: 52.659770 seconds
py> with Stopwatch():
... numpy.sum(carray)
...
11250.0
time taken: 0.161263 seconds


>  Why have builtin sum at all if its use comes with so many caveats?

Because sum() is a perfectly reasonable general purpose tool for adding 
up small amounts of numbers where high floating point precision is not 
required. It has been included as a built-in because Python comes with 
"batteries included", and a basic function for adding up a few numbers 
is an obvious, simple battery. But serious programmers should be 
comfortable with the idea that you use the right tool for the right job.

If you visit a hardware store, you will find that even something as 
simple as the hammer exists in many specialist varieties. There are tack 
hammers, claw hammers, framing hammers, lump hammers, rubber and wooden 
mallets, "brass" non-sparking 

Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Greg Ewing

Steven D'Aprano wrote:
I've long believed that + is the wrong operator for concatenating 
strings, and that & makes a much better operator.


Do you have a reason for preferring '&' in particular, or
do you just want something different from '+'?

Personally I can't see why "bitwise and" on strings should
be a better metaphor for concatenation that "addition". :-)

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-08 Thread Antoine Pitrou

Le 09/08/2014 01:08, Steven D'Aprano a écrit :

On Fri, Aug 08, 2014 at 10:20:37PM -0400, Alexander Belopolsky wrote:

On Fri, Aug 8, 2014 at 8:56 PM, Ethan Furman  wrote:


I don't use sum at all, or at least very rarely, and it still irritates me.


You are not alone.  When I see sum([a, b, c]), I think it is a + b + c, but
in Python it is 0 + a + b + c.  If we had a "join" operator for strings
that is different form + - then sure, I would not try to use sum to join
strings, but we don't.


I've long believed that + is the wrong operator for concatenating
strings, and that & makes a much better operator. We wouldn't be having
these interminable arguments about using sum() to concatenate strings
(and lists, and tuples) if the & operator was used for concatenation and
+ was only used for numeric addition.


Come on. These arguments are interminable because many people (including 
you) love feeding interminable arguments. No need to blame Python for that.


And for that matter, this interminable discussion should probably have 
taken place on python-ideas or even python-list.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Stephen J. Turnbull
Alexander Belopolsky writes:

 > Why have builtin sum at all if its use comes with so many caveats?

Because we already have it.  If the caveats had been known when it was
introduced, maybe it wouldn't have been.  The question is whether you
can convince python-dev that it's worth changing the definition of
sum().  IMO that's going to be very hard to do.  All the suggestions
I've seen so far are (IMHO, YMMV) just as ugly as the present
situation.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Paul Moore
On 9 August 2014 06:08, Steven D'Aprano  wrote:
> py> with Stopwatch():
> ... sum(carray)  # carray is a numpy array of 7500 floats.
> ...
> 11250.0
> time taken: 52.659770 seconds
> py> with Stopwatch():
> ... numpy.sum(carray)
> ...
> 11250.0
> time taken: 0.161263 seconds
>
>
>>  Why have builtin sum at all if its use comes with so many caveats?
>
> Because sum() is a perfectly reasonable general purpose tool for adding
> up small amounts of numbers where high floating point precision is not
> required. It has been included as a built-in because Python comes with
> "batteries included", and a basic function for adding up a few numbers
> is an obvious, simple battery. But serious programmers should be
> comfortable with the idea that you use the right tool for the right job.

Changing the subject a little, but the Stopwatch function you used up
there is "an obvious, simple battery" for timing a chunk of code at
the interactive prompt. I'm amazed there's nothing like it in the
timeit module...

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Alexander Belopolsky
On Sat, Aug 9, 2014 at 3:08 AM, Stephen J. Turnbull 
wrote:

> All the suggestions
> I've seen so far are (IMHO, YMMV) just as ugly as the present
> situation.
>

What is ugly about allowing strings?  CPython certainly has a way to to
make sum(x, '') at least as efficient as y='';for in in x; y+= x is now.
 What is ugly about making sum([a, b, ..]) be equivalent to a + b + .. so
that non-empty lists of arbitrary types can be "summed"?  What is ugly
about harmonizing sum(x) and reduce(operator.add, x) behaviors?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Alexander Belopolsky
On Sat, Aug 9, 2014 at 2:02 PM, Alexander Belopolsky <
alexander.belopol...@gmail.com> wrote:

> y='';for in in x; y+= x


Should have been

y=''
for i in x; y += i
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Alexander Belopolsky
On Sat, Aug 9, 2014 at 1:08 AM, Steven D'Aprano  wrote:

> We wouldn't be having
> these interminable arguments about using sum() to concatenate strings
> (and lists, and tuples) if the & operator was used for concatenation and
> + was only used for numeric addition.
>

But we would probably have a similar discussion about all(). :-)

Use of + is consistent with the use of * for repetition.  What would you
use use for repetition if you use & instead?

Compare, for example

s + ' ' * (n - len(s))

and

s & ' ' * (n - len(s))

Which one is clearer?

It is sum() that need to be fixed, not +.  Not having sum([a, b])
equivalent to a + b for any a, b pair is hard to justify.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-09 Thread Devin Jeanpierre
On Sat, Aug 9, 2014 at 12:20 PM, Alexander Belopolsky
 wrote:
>
> On Sat, Aug 9, 2014 at 1:08 AM, Steven D'Aprano  wrote:
>>
>> We wouldn't be having
>> these interminable arguments about using sum() to concatenate strings
>> (and lists, and tuples) if the & operator was used for concatenation and
>> + was only used for numeric addition.
>
>
> But we would probably have a similar discussion about all(). :-)
>
> Use of + is consistent with the use of * for repetition.  What would you use
> use for repetition if you use & instead?

If the only goal is to not be tempted to use sum() for string
concatenation, how about using *? This is more consistent with
mathematics terminology, where a * b is not necessarily the same as b
* a (unlike +, which is commutative). As an example, consider matrix
multiplication. Then, to answer your question, repetition would have
been s ** n.

(In fact, this is the notation for concatenation and repetition used
in formal language theory.)

(If we really super wanted to add this to Python, obviously we'd use
the @ and @@ operators. But it's a bit late for that.)

-- Devin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread Stephen J. Turnbull
Alexander Belopolsky writes:
 > On Sat, Aug 9, 2014 at 3:08 AM, Stephen J. Turnbull 
 > wrote:
 > 
 > > All the suggestions
 > > I've seen so far are (IMHO, YMMV) just as ugly as the present
 > > situation.
 > >
 > 
 > What is ugly about allowing strings?  CPython certainly has a way to to
 > make sum(x, '')

sum(it, '') itself is ugly.  As I say, YMMV, but in general last I
heard arguments that are usually constants drawn from a small set of
constants are considered un-Pythonic; a separate function to express
that case is preferred.  I like the separate function style.

And that's the current situation, except that in the case of strings
it turns out to be useful to allow for "sums" that have "glue" at the
joints, so it's spelled as a string method rather than a builtin: eg,
", ".join(paramlist).

Actually ... if I were a fan of the "".join() idiom, I'd seriously
propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
(with the same efficient semantics).

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread Barry Warsaw
On Aug 10, 2014, at 05:24 PM, Stephen J. Turnbull wrote:

>Actually ... if I were a fan of the "".join() idiom, I'd seriously
>propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
>deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
>(with the same efficient semantics).

Ever since ''.join was added, there has been vague talk about adding a join()
built-in.  If the semantics and argument syntax can be worked out, I'd still
be in favor of that.  Probably deserves a PEP and a millithread community
bikeshed paintdown.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread Glenn Linderman

On 8/10/2014 1:24 AM, Stephen J. Turnbull wrote:

Actually ... if I were a fan of the "".join() idiom, I'd seriously
propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
(with the same efficient semantics).
Actually, there is no need to wait for 0.sum() to propose "".sum... but 
it is only a spelling change, so no real benefit.


Thinking about this more, maybe it should be a class function, so that 
it wouldn't require an instance:


str.sum( iterable_containing_strings )

[ or  str.join( iterable_containing_strings ) ]
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread R. David Murray
On Sun, 10 Aug 2014 13:12:26 -0700, Glenn Linderman  
wrote:
> On 8/10/2014 1:24 AM, Stephen J. Turnbull wrote:
> > Actually ... if I were a fan of the "".join() idiom, I'd seriously
> > propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
> > deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
> > (with the same efficient semantics).
> Actually, there is no need to wait for 0.sum() to propose "".sum... but 
> it is only a spelling change, so no real benefit.
> 
> Thinking about this more, maybe it should be a class function, so that 
> it wouldn't require an instance:
> 
> str.sum( iterable_containing_strings )
> 
> [ or  str.join( iterable_containing_strings ) ]

That's how it used to be spelled in python2.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread R. David Murray
On Sun, 10 Aug 2014 13:12:26 -0700, Glenn Linderman  
wrote:
> On 8/10/2014 1:24 AM, Stephen J. Turnbull wrote:
> > Actually ... if I were a fan of the "".join() idiom, I'd seriously
> > propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
> > deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
> > (with the same efficient semantics).
> Actually, there is no need to wait for 0.sum() to propose "".sum... but 
> it is only a spelling change, so no real benefit.
> 
> Thinking about this more, maybe it should be a class function, so that 
> it wouldn't require an instance:
> 
> str.sum( iterable_containing_strings )
> 
> [ or  str.join( iterable_containing_strings ) ]

Sorry, I mean 'string.join' is how it used to be spelled.  Making it a
class method is indeed slightly different.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-10 Thread Stephen J. Turnbull
Glenn Linderman writes:

 > On 8/10/2014 1:24 AM, Stephen J. Turnbull wrote:
 > > Actually ... if I were a fan of the "".join() idiom, I'd seriously
 > > propose 0.sum(numeric_iterable) as the RightThang{tm].  Then we could
 > > deprecate "".join(string_iterable) in favor of "".sum(string_iterable)
 > > (with the same efficient semantics).

 > Actually, there is no need to wait for 0.sum() to propose "".sum... but 
 > it is only a spelling change, so no real benefit.

IMO it's worse than merely a spelling change, because (1) "join" is a
more evocative term for concatenating strings than "sum" and (2) I
don't know of any other sums that allow "glue".

I'm overall -1 on trying to change the current situation (except for
adding a join() builtin or str.join class method).  We could probably
fix everything in a static-typed language (because that would allow
picking an initial object of the appropriate type), but without that
we need to pick a default of some particular type, and 0 makes the
most sense.

I can understand the desire of people who want to use the same syntax
for summing an iterable of numbers and for concatenating an iterable
of strings, but to me they're really not even formally the same in
practical use.  I'm very sympathetic to Steven's explanation that "we
wouldn't be having this discussion if we used a different operator for
string concatenation".  Although that's not the whole story: in
practice even numerical sums get split into multiple functions because
floating point addition isn't associative, and so needs careful
treatment to preserve accuracy.  At that point I'm strongly +1 on
abandoning attempts to "rationalize" summation.

I'm not sure how I'd feel about raising an exception if you try to sum
any iterable containing misbehaved types like float.  But not only
would that be a Python 4 effort due to backward incompatibility, but
it sorta contradicts the main argument of proponents ("any type
implementing __add__ should be sum()-able").

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Ben Hoyt
It seems to me this is something of a pointless discussion -- I highly
doubt the current situation is going to change, and it works very well.
Even if not perfect, sum() is for numbers, sep.join() for strings. However,
I will add one comment:

I'm overall -1 on trying to change the current situation (except for
> adding a join() builtin or str.join class method).


Did you know there actually is a str.join "class method"? I've never
actually seen it used this way, but for people who just can't stand
sep.join(seq), you can always call str.join(sep, seq) -- works in Python 2
and 3:

>>> str.join('.', ['abc', 'def', 'ghi'])
'abc.def.ghi'

This works as a side effect of the fact that you can call methods as
cls.method(instance, args).

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Chris Barker - NOAA Federal
> I'm very sympathetic to Steven's explanation that "we
> wouldn't be having this discussion if we used a different operator for
> string concatenation".

Sure -- but just imagine the conversations we could be having instead
: what does bit wise and of a string mean? A bytes object? I cod see
it as a character-wise and, for instance  ;-)

My confusion is still this:

Repeated summation of strings has been optimized in cpython even
though it's not the recommended way to solve that problem.

So why not special case optimize sum() for strings? We are already
special-case strings to raise an exception.

It seems pretty pedantic to say: we cod make this work well, but we'd
rather chide you for not knowing the "proper" way to do it.

Practicality beats purity?

-Chris




> Although that's not the whole story: in
> practice even numerical sums get split into multiple functions because
> floating point addition isn't associative, and so needs careful
> treatment to preserve accuracy.  At that point I'm strongly +1 on
> abandoning attempts to "rationalize" summation.
>
> I'm not sure how I'd feel about raising an exception if you try to sum
> any iterable containing misbehaved types like float.  But not only
> would that be a Python 4 effort due to backward incompatibility, but
> it sorta contradicts the main argument of proponents ("any type
> implementing __add__ should be sum()-able").
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Terry Reedy

On 8/11/2014 8:26 AM, Ben Hoyt wrote:

It seems to me this is something of a pointless discussion -- I highly
doubt the current situation is going to change, and it works very well.
Even if not perfect, sum() is for numbers, sep.join() for strings.
However, I will add one comment:

I'm overall -1 on trying to change the current situation (except for
adding a join() builtin or str.join class method).


Did you know there actually is a str.join "class method"?


A 'method' is a function accessed as an attribute of a class.
An 'instance method' is a method whose first parameter is an instance of 
the class. str.join is an instance method.  A 'class method', wrapped as 
such with classmether(), usually by decorating it with @classmethod, 
would take the class as a parameter.



I've never
actually seen it used this way, but for people who just can't stand
sep.join(seq), you can always call str.join(sep, seq) -- works in Python
2 and 3:

 >>> str.join('.', ['abc', 'def', 'ghi'])
'abc.def.ghi'


One could even put 'join = str.join' at the top of a file.

All this is true of *every* instance method.  For instance

int.__add__(1, 2) == 1 .__add__(2) == 1 + 2

True

However, your point that people who cannot stand the abbreviation 
*could* use the full form that is being abbreviated.



In ancient Python, when strings did not have methods, the current string 
methods were functions in the string module. The functions were removed 
in 3.0.  Their continued use in 2.x code is bad for 3.x compatibility, 
so I would not encourage it.


>>> help(string.join)  # 2.7.8
Help on function join in module string:

join(words, sep=' ')
join(list [,sep]) -> string

Return a string composed of the words in list, with
intervening occurrences of sep.  The default separator is a
single space.

'List' is obsolete.  Since sometime before 2.7, 'words' meant an 
iterable of strings.


>>> def digits():
for i in range(10):
yield str(i)

>>> string.join(digits(), '')
'0123456789'

Of of the string functions, I believe the conversion of join (and its 
synonum 'joinfields') to a method has been the most contentious.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Nick Coghlan
On 12 Aug 2014 03:03, "Chris Barker - NOAA Federal" 
wrote:
>
> My confusion is still this:
>
> Repeated summation of strings has been optimized in cpython even
> though it's not the recommended way to solve that problem.

The quadratic behaviour of repeated str summation is a subtle, silent
error. It *is* controversial that CPython silently optimises some cases of
it away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

> So why not special case optimize sum() for strings? We are already
> special-case strings to raise an exception.
>
> It seems pretty pedantic to say: we cod make this work well, but we'd
> rather chide you for not knowing the "proper" way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour. We *want* people
to learn that distinction, not sweep it under the rug. That's the other
reason the implicit optimisation is controversial - it hides an important
difference in algorithmic complexity from users.

> Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Other containers don't tend to be concatenated like that in the first
place, so there's no such check pushing other iterables towards
itertools.chain.

Regards,
Nick.

>
> -Chris
>
>
>
>
> > Although that's not the whole story: in
> > practice even numerical sums get split into multiple functions because
> > floating point addition isn't associative, and so needs careful
> > treatment to preserve accuracy.  At that point I'm strongly +1 on
> > abandoning attempts to "rationalize" summation.
> >
> > I'm not sure how I'd feel about raising an exception if you try to sum
> > any iterable containing misbehaved types like float.  But not only
> > would that be a Python 4 effort due to backward incompatibility, but
> > it sorta contradicts the main argument of proponents ("any type
> > implementing __add__ should be sum()-able").
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Alexander Belopolsky
On Mon, Aug 11, 2014 at 8:19 PM, Nick Coghlan  wrote:

> Teaching users the difference between linear time operations and quadratic
> ones isn't about purity, it's about passing along a fundamental principle
> of algorithm scalability.


I would understand if this was done in reduce(operator.add, ..) which
indeed spells out the choice of an algorithm, but why sum() should be O(N)
for numbers and O(N**2) for containers?  Would a python implementation
that, for example, optimizes away 0's in sum(list_of_numbers) be
non-compliant with some fundamental principle?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Chris Barker - NOAA Federal
Sorry for the bike shedding here, but:

The quadratic behaviour of repeated str summation is a subtle, silent error.

OK, fair enough. I suppose it would be hard and ugly to catch those
instances and raise an exception pointing users to "".join.

*is* controversial that CPython silently optimises some cases of it away,
since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

Is there anything in the language spec that says string concatenation is
O(n^2)? Or for that matter any of the performs characteristics of build in
types? Those striker as implementation details that SHOULD be particular to
the implementation.

Should we cripple the performance of some operation in Cpython so that it
won't work better that Jython? That seems an odd choice. Then how dare PyPy
make scalar computation faster? People might switch to cPython and not know
they should have been using numpy all along...

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

Seems the same argument could be made for sum(list_of_strings).

 > It seems pretty pedantic to say: we could make this work well, but we'd
> rather chide you for not knowing the "proper" way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour.

But if it were optimized, it wouldn't incur quadratic behavior.

We *want* people to learn that distinction, not sweep it under the rug.

But sum() is not inherently quadratic -- that's a limitation of the
implementation. I agree that disallowing it is a good idea given that
behavior, but if it were optimized, there would be no reason to steer
people away.

"".join _could_ be naively written with the same poor performance -- why
should users need to understand why one was optimized and one was not?

That's the other reason the implicit optimisation is controversial - it
hides an important difference in algorithmic complexity from users.

It doesn't hide it -- it eliminates it. I suppose it's good for folks to
understand the implications of string immutability for when they write
their own algorithms, but this wouldn't be considered a good argument for a
poorly performing sort() for instance.

> Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

That is a very import a lesson to learn, sure, but python is not only a
teaching language. People will need to learn those lessons at some point,
this one feature makes little difference.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Sure, but I think all that does is teach people about a cpython specific
implementation -- and I doubt naive users get any closer to understanding
algorithmic complexity -- all they learn is you should use string.join().

Oh well, not really that big a deal.

-Chris
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes:

 > Is there anything in the language spec that says string concatenation is
 > O(n^2)? Or for that matter any of the performs characteristics of build in
 > types? Those striker as implementation details that SHOULD be particular to
 > the implementation.

Container concatenation isn't quadratic in Python at all.  The naive
implementation of sum() as a loop repeatedly calling __add__ is
quadratic for them.  Strings (and immutable containers in general) are
particularly horrible, as they don't have __iadd__.

You could argue that sum() being a function of an iterable isn't just
a calling convention for a loop encapsulated in a function, but rather
a completely different kind of function that doesn't imply anything
about the implementation, and therefore that it should dispatch on
type(it).  But explicitly dispatching on type(x) is yucky (what if
somebody wants to sum a different type not currently recognized by the
sum() builtin?) so, obviously, we should define a standard __sum__
dunder!  IMO we'd also want a homogeneous_iterable ABC, and a concrete
homogeneous_iterable_of_TYPE for each sum()-able TYPE to help users
catch bugs injecting the wrong type into an iterable_of_TYPE.

But this still sucks.  Why?  Because obviously we'd want the
attractive nuisance of "if you have __add__, there's a default
definition of __sum__" (AIUI, this is what bothers Alexander most
about the current situation, at least of the things he's mentioned, I
can really sympathize with his dislike).  And new Pythonistas and lazy
programmers who only intend to use sum() on "small enough" iterables
will use the default, and their programs will appear to hang on
somewhat larger iterable, or a realtime requirement will go
unsatisfied when least expected, or   If we *don't* have that
property for sum(), ugh!  Yuck!  Same old same old!  (IMHO, YMMV of
course)

It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has __add__,
but I think this will take a lot of thinking.  *Somebody* will do it
(I don't think anybody is +1 on restricting sum() to a subset of types
with __add__).  I just think we should wait until that somebody appears.

 > Should we cripple the performance of some operation in Cpython so that it
 > won't work better that Jython?

Nobody is crippling operations.  We're prohibiting use of a *name* for
an operation that is associated (strongly so, in my mind) with an
inefficient algorithm in favor of the *same operation* by a different
name (which has no existing implementation, and therefore Python
implementers are responsible for implementing it efficiently).  Note:
the "inefficient" algorithm isn't inefficient for integers, and it
isn't inefficient for numbers in general (although it's inaccurate for
some classes of numbers).

 > Seems the same argument [that Python language doesn't prohibit
 > optimizations in particular implementations just because they
 > aren't made in others] could be made for sum(list_of_strings).

It could.  But then we have to consider special-casing every builtin
type that provides __add__, and we impose an unobvious burden on user
types that provide __add__.

 > > It seems pretty pedantic to say: we could make this work well,
 > > but we'd rather chide you for not knowing the "proper" way to do
 > > it.

Nobody disagrees.  But backward compatibility gets in the way.

 > But sum() is not inherently quadratic -- that's a limitation of the
 > implementation.

But the faulty implementation is the canonical implementation, the
only one that can be defined directly in terms of __add__, and it is
efficient for non-container types.[1]

 > "".join _could_ be naively written with the same poor performance
 > -- why should users need to understand why one was optimized and
 > one was not?

Good question.  They shouldn't -- thus the prohibition on sum()ing
strings.

 > That is a very import a lesson to learn, sure, but python is not
 > only a teaching language. People will need to learn those lessons
 > at some point, this one feature makes little difference.

No, it makes a big difference.  If you can do something, then it's OK
to do it, is something Python tries to implement.  If sum() works for
everything with an __add__, given current Python language features
some people are going to end up with very inefficient code and it will
bite some of them (and not necessarily the authors!) at some time.

If it doesn't work for every type with __add__, why not?  You'll end
up playing whack-a-mole with type prohibitions.  Ugh.

 > Sure, but I think all that does is teach people about a cpython specific
 > implementation -- and I doubt naive users get any closer to understanding
 > algorithmic complexity -- all they learn is you should use string.join().
 > 
 > Oh well, not really that big a deal.

Not to Python.  Maybe not to you.  But I've learned a lot about
Pythonic ways of doing things trying to channe

Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Ethan Furman

On 08/11/2014 08:50 PM, Stephen J. Turnbull wrote:

Chris Barker - NOAA Federal writes:


It seems pretty pedantic to say: we could make this work well,
but we'd rather chide you for not knowing the "proper" way to do
it.


Nobody disagrees.  But backward compatibility gets in the way.


Something that currently doesn't work, starts to.  How is that a backward 
compatibility problem?

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Stephen J. Turnbull
Ethan Furman writes:
 > On 08/11/2014 08:50 PM, Stephen J. Turnbull wrote:
 > > Chris Barker - NOAA Federal writes:
 > >
 > >> It seems pretty pedantic to say: we could make this work well,
 > >> but we'd rather chide you for not knowing the "proper" way to do
 > >> it.
 > >
 > > Nobody disagrees.  But backward compatibility gets in the way.
 > 
 > Something that currently doesn't work, starts to.  How is that a
 > backward compatibility problem?

I'm referring to removing the unnecessary information that there's a
better way to do it, and simply raising an error (as in Python 3.2,
say) which is all a RealProgrammer[tm] should ever need!

That would be a regression and backward incompatible.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Nick Coghlan
On 12 Aug 2014 11:21, "Chris Barker - NOAA Federal" 
wrote:
>
> Sorry for the bike shedding here, but:
>
>> The quadratic behaviour of repeated str summation is a subtle, silent
error.
>
> OK, fair enough. I suppose it would be hard and ugly to catch those
instances and raise an exception pointing users to "".join.
>>
>> *is* controversial that CPython silently optimises some cases of it
away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.
>
> Is there anything in the language spec that says string concatenation is
O(n^2)? Or for that matter any of the performs characteristics of build in
types? Those striker as implementation details that SHOULD be particular to
the implementation.

If you implement strings so they have multiple data segments internally (as
is the case for StringIO these days), yes, you can avoid quadratic time
concatenation behaviour. Doing so makes it harder to meet other complexity
expectations (like O(1) access to arbitrary code points), and isn't going
to happen in CPython regardless due to C API backwards compatibility
constraints.

For the explicit loop with repeated concatenation, we can't say "this is
slow, don't do it". People do it anyway, so we've opted for the "fine, make
it as fast as we can" option as being preferable to an obscure and
relatively hard to debug performance problem.

For sum(), we have the option of being more direct and just telling people
Python's answer to the string concatenation problem (i.e. str.join). That
is decidedly *not* the series of operations described in sum's
documentation as "Sums start and the items of an iterable from left to
right and returns the total."

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Armin Rigo
Hi all,

The core of the matter is that if we repeatedly __add__ strings from a
long list, we get O(n**2) behavior.  For one point of view, the
reason is that the additions proceed in left-to-right order.  Indeed,
sum() could proceed in a more balanced tree-like order: from [x0, x1,
x2, x3, ...], reduce the list to [x0+x1, x2+x3, ...]; then repeat
until there is only one item in the final list.  This order ensures
that sum(list_of_strings) is at worst O(n log n).  It might be in
practice close enough from linear to not matter.  It also improves a
lot the precision of sum(list_of_floats) (though not reaching the same
precision levels of math.fsum()).


Just a thought,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Chris Barker
On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull 
wrote:

> I'm referring to removing the unnecessary information that there's a
>  better way to do it, and simply raising an error (as in Python 3.2,
> say) which is all a RealProgrammer[tm] should ever need!
>

I can't imagine anyone is suggesting that -- disallow it, but don't tell
anyone why?

The only thing that is remotely on the table here is:

1) remove the special case for strings -- buyer beware -- but consistent
and less "ugly"

2) add a special case for strings that is fast and efficient -- may be as
simple as calling "".join() under the hood --no more code than the
exception check.

And I doubt anyone really is pushing for anything but (2)

Steven Turnbull wrote:

>   IMO we'd also want a homogeneous_iterable ABC


Actually, I've thought for years that that would open the door to a lot of
optimizations -- but that's a much broader question that sum(). I even
brought it up probably over ten years ago -- but no one was the least bit
iinterested -- nor are they now -- I now this was a rhetorical suggestion
to make the point about what not to do

  Because obviously we'd want the
> attractive nuisance of "if you have __add__, there's a default
> definition of __sum__"


now I'm confused -- isn't that exactly what we have now?

It's possible that Python could provide some kind of feature that
> would allow an optimized sum function for every type that has __add__,
> but I think this will take a lot of thinking.


does it need to be every type? As it is the common ones work fine already
except for strings -- so if we add an optimized string sum() then we're
done.

 *Somebody* will do it
> (I don't think anybody is +1 on restricting sum() to a subset of types
> with __add__).


uhm, that's exactly what we have now -- you can use sum() with anything
that has an __add__, except strings. Ns by that logic, if we thought there
were other inefficient use cases, we'd restrict those too.

But users can always define their own classes that have a __sum__ and are
really inefficient -- so unless sum() becomes just for a certain subset of
built-in types -- does anyone want that? Then we are back to the current
situation:

sum() can be used for any type that has an __add__ defined.

But naive users are likely to try it with strings, and that's bad, so we
want to prevent that, and have a special case check for strings.

What I fail to see is why it's better to raise an exception and point users
to a better way, than to simply provide an optimization so that it's a mute
issue.

The only justification offered here is that will teach people that summing
strings (and some other objects?) is order(N^2) and a bad idea. But:

a) Python's primary purpose is practical, not pedagogical (not that it
isn't great for that)

b) I doubt any naive users learn anything other than "I can't use sum() for
strings, I should use "".join()". Will they make the leap to "I shouldn't
use string concatenation in a loop, either"? Oh, wait, you can use string
concatenation in a loop -- that's been optimized. So will they learn: "some
types of object shave poor performance with repeated concatenation and
shouldn't be used with sum(). So If I write such a class, and want to sum
them up, I'll need to write an optimized version of that code"?

I submit that no naive user is going to get any closer to a proper
understanding of algorithmic Order behavior from this small hint. Which
leaves no reason to prefer an Exception to an optimization.

One other point: perhaps this will lead a naive user into thinking --
"sum() raises an exception if I try to use it inefficiently, so it must be
OK to use for anything that doesn't raise an exception" -- that would be a
bad lesson to mis-learn

-Chris

PS:
Armin Rigo wrote:

> It also improves a
> lot the precision of sum(list_of_floats) (though not reaching the same
> precision levels of math.fsum()).


while we are at it, having the default sum() for floats be fsum() would be
nice -- I'd rather the default was better accuracy loser performance. Folks
that really care about performance could call math.fastsum(), or really,
use numpy...

This does turn sum() into a function that does type-based dispatch, but
isn't python full of those already? do something special for the types you
know about, call the generic dunder method for the rest.



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Stefan Richthofer

I know, I have nothing to decide here, since I'm no contributer and just a silent watcher on this list.

However I just wanted to point out I fully agree with Chris Barker's position. Couldn't have stated

it better. Performance should be interpreter implementation issue, not language issue.

 

> 2) add a special case for strings that is fast and efficient -- may be as simple as calling "".join() under the hood --no more code than the exception check.

I would give it a +1 if my opinion counts anything.

 

Cheers

 

Stefan

 

 

Gesendet: Dienstag, 12. August 2014 um 21:11 Uhr
Von: "Chris Barker" 
An: Kein Empfänger
Cc: "Python Dev" 
Betreff: Re: [Python-Dev] sum(...) limitation




On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <step...@xemacs.org> wrote:


I'm referring to removing the unnecessary information that there's a
better way to do it, and simply raising an error (as in Python 3.2,
say) which is all a RealProgrammer[tm] should ever need!

 

I can't imagine anyone is suggesting that -- disallow it, but don't tell anyone why? 

 

The only thing that is remotely on the table here is:

 

1) remove the special case for strings -- buyer beware -- but consistent and less "ugly"

 

2) add a special case for strings that is fast and efficient -- may be as simple as calling "".join() under the hood --no more code than the exception check.

 

And I doubt anyone really is pushing for anything but (2)

 

Steven Turnbull wrote:

  IMO we'd also want a homogeneous_iterable ABC

 

Actually, I've thought for years that that would open the door to a lot of optimizations -- but that's a much broader question that sum(). I even brought it up probably over ten years ago -- but no one was the least bit iinterested -- nor are they now -- I now this was a rhetorical suggestion to make the point about what not to do

 

  Because obviously we'd want the
attractive nuisance of "if you have __add__, there's a default
definition of __sum__" 

 

now I'm confused -- isn't that exactly what we have now?

 

It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has __add__,
but I think this will take a lot of thinking. 

 

does it need to be every type? As it is the common ones work fine already except for strings -- so if we add an optimized string sum() then we're done.

 

 *Somebody* will do it
(I don't think anybody is +1 on restricting sum() to a subset of types
with __add__). 

 

uhm, that's exactly what we have now -- you can use sum() with anything that has an __add__, except strings. Ns by that logic, if we thought there were other inefficient use cases, we'd restrict those too.

 

But users can always define their own classes that have a __sum__ and are really inefficient -- so unless sum() becomes just for a certain subset of built-in types -- does anyone want that? Then we are back to the current situation:

 

sum() can be used for any type that has an __add__ defined.

 

But naive users are likely to try it with strings, and that's bad, so we want to prevent that, and have a special case check for strings.

 

What I fail to see is why it's better to raise an exception and point users to a better way, than to simply provide an optimization so that it's a mute issue.

 

The only justification offered here is that will teach people that summing strings (and some other objects?) is order(N^2) and a bad idea. But:

 

a) Python's primary purpose is practical, not pedagogical (not that it isn't great for that)

 

b) I doubt any naive users learn anything other than "I can't use sum() for strings, I should use "".join()". Will they make the leap to "I shouldn't use string concatenation in a loop, either"? Oh, wait, you can use string concatenation in a loop -- that's been optimized. So will they learn: "some types of object shave poor performance with repeated concatenation and shouldn't be used with sum(). So If I write such a class, and want to sum them up, I'll need to write an optimized version of that code"?

 

I submit that no naive user is going to get any closer to a proper understanding of algorithmic Order behavior from this small hint. Which leaves no reason to prefer an Exception to an optimization.

 

One other point: perhaps this will lead a naive user into thinking -- "sum() raises an exception if I try to use it inefficiently, so it must be OK to use for anything that doesn't raise an exception" -- that would be a bad lesson to mis-learn

 

-Chris

 

PS: 

Armin Rigo wrote:

It also improves a
lot the precision of sum(list_of_floats) (though not reaching the same
precision levels of math.fsum()).

 

while we are at it, having th

Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Nikolaus Rath
Chris Barker  writes:
> What I fail to see is why it's better to raise an exception and point users
> to a better way, than to simply provide an optimization so that it's a mute
> issue.
>
> The only justification offered here is that will teach people that summing
> strings (and some other objects?) is order(N^2) and a bad idea. But:
>
> a) Python's primary purpose is practical, not pedagogical (not that it
> isn't great for that)
>
> b) I doubt any naive users learn anything other than "I can't use sum() for
> strings, I should use "".join()". Will they make the leap to "I shouldn't
> use string concatenation in a loop, either"? Oh, wait, you can use string
> concatenation in a loop -- that's been optimized. So will they learn: "some
> types of object shave poor performance with repeated concatenation and
> shouldn't be used with sum(). So If I write such a class, and want to sum
> them up, I'll need to write an optimized version of that code"?
>
> I submit that no naive user is going to get any closer to a proper
> understanding of algorithmic Order behavior from this small hint. Which
> leaves no reason to prefer an Exception to an optimization.
>
> One other point: perhaps this will lead a naive user into thinking --
> "sum() raises an exception if I try to use it inefficiently, so it must be
> OK to use for anything that doesn't raise an exception" -- that would be a
> bad lesson to mis-learn

AOL to that.

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-12 Thread Stephen J. Turnbull
Redirecting to python-ideas, so trimming less than I might.

Chris Barker writes:
 > On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull 
 > wrote:
 > 
 > > I'm referring to removing the unnecessary information that there's a
 > >  better way to do it, and simply raising an error (as in Python 3.2,
 > > say) which is all a RealProgrammer[tm] should ever need!
 > >
 > 
 > I can't imagine anyone is suggesting that -- disallow it, but don't tell
 > anyone why?

As I said, it's a regression.  That's exactly the behavior in Python 3.2.

 > The only thing that is remotely on the table here is:
 > 
 > 1) remove the special case for strings -- buyer beware -- but consistent
 > and less "ugly"

It's only consistent if you believe that Python has strict rules for
use of various operators.  It doesn't, except as far as they are
constrained by precedence.  For example, I have an application where I
add bytestrings bytewise modulo N <= 256, and concatenate them.  In
fact I use function call syntax, but the obvious operator syntax is
'+' for the bytewise addition, and '*' for the concatenation.

It's not in the Zen, but I believe in the maxim "If it's worth doing,
it's worth doing well."  So for me, 1) is out anyway.

 > 2) add a special case for strings that is fast and efficient -- may be as
 > simple as calling "".join() under the hood --no more code than the
 > exception check.

Sure, but what about all the other immutable containers with __add__
methods?  What about mappings with key-wise __add__ methods whose
values might be immutable but have __add__ methods?  Where do you stop
with the special-casing?  I consider this far more complex and ugly
than the simple "sum() is for numbers" rule (and even that is way too
complex considering accuracy of summing floats).

 > And I doubt anyone really is pushing for anything but (2)

I know that, but I think it's the wrong solution to the problem (which
is genuine IMO).  The right solution is something generic, possibly a
__sum__ method.  The question is whether that leads to too much work
to be worth it (eg, "homogeneous_iterable").

 > > Because obviously we'd want the attractive nuisance of "if you
 > > have __add__, there's a default definition of __sum__"
 > 
 > now I'm confused -- isn't that exactly what we have now?

Yes and my feeling (backed up by arguments that I admit may persuade
nobody but myself) is that what we have now kinda sucks[tm].  It
seemed like a good idea when I first saw it, but then, my apps don't
scale to where the pain starts in my own usage.

 > > It's possible that Python could provide some kind of feature that
 > > would allow an optimized sum function for every type that has
 > > __add__, but I think this will take a lot of thinking.
 > 
 > does it need to be every type? As it is the common ones work fine already
 > except for strings -- so if we add an optimized string sum() then we're
 > done.

I didn't say provide an optimized sum(), I said provide a feature
enabling people who want to optimize sum() to do so.  So yes, it needs
to be every type (the optional __sum__ method is a proof of concept,
modulo it actually being implementable ;-).

 > > *Somebody* will do it (I don't think anybody is +1 on restricting
 > > sum() to a subset of types with __add__).
 > 
 > uhm, that's exactly what we have now

Exactly.  Who's arguing that the sum() we have now is a ticket to
Paradise?  I'm just saying that there's probably somebody out there
negative enough on the current situation to come up with an answer
that I think is general enough (and I suspect that python-dev
consensus is that demanding, too).

 > sum() can be used for any type that has an __add__ defined.

I'd like to see that be mutable types with __iadd__.

 > What I fail to see is why it's better to raise an exception and
 > point users to a better way, than to simply provide an optimization
 > so that it's a mute issue.

Because inefficient sum() is an attractive nuisance, easy to overlook,
and likely to bite users other than the author.

 > The only justification offered here is that will teach people that summing
 > strings (and some other objects?)

Summing tuples works (with appropriate start=tuple()).  Haven't
benchmarked, but I bet that's O(N^2).

 > is order(N^2) and a bad idea. But:
 > 
 > a) Python's primary purpose is practical, not pedagogical (not that it
 > isn't great for that)

My argument is that in practical use sum() is a bad idea, period,
until you book up on the types and applications where it *does* work.
N.B. It doesn't even work properly for numbers (inaccurate for floats).

 > b) I doubt any naive users learn anything other than "I can't use sum() for
 > strings, I should use "".join()".

For people who think that special-casing strings is a good idea, I
think this is about as much benefit as you can expect.  Why go
farther?<0.5 wink/>

 > I submit that no naive user is going to get any closer to a proper
 > understanding of algorithmic Order behavior from this

Re: [Python-Dev] sum(...) limitation

2014-08-13 Thread Ronald Oussoren

On 12 Aug 2014, at 10:02, Armin Rigo  wrote:

> Hi all,
> 
> The core of the matter is that if we repeatedly __add__ strings from a
> long list, we get O(n**2) behavior.  For one point of view, the
> reason is that the additions proceed in left-to-right order.  Indeed,
> sum() could proceed in a more balanced tree-like order: from [x0, x1,
> x2, x3, ...], reduce the list to [x0+x1, x2+x3, ...]; then repeat
> until there is only one item in the final list.  This order ensures
> that sum(list_of_strings) is at worst O(n log n).  It might be in
> practice close enough from linear to not matter.  It also improves a
> lot the precision of sum(list_of_floats) (though not reaching the same
> precision levels of math.fsum()).

I wonder why nobody has mentioned previous year’s discussion of the same issue 
yet: http://marc.info/?l=python-ideas&m=137359619831497&w=2

Maybe someone can write a PEP about this that can be pointed when the question 
is discussed again next summer ;-)

Ronald

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-13 Thread Chris Barker
On Tue, Aug 12, 2014 at 11:21 PM, Stephen J. Turnbull 
wrote:

> Redirecting to python-ideas, so trimming less than I might.


reasonable enough -- you are introducing some more significant ideas for
changes.

I've said all I have to say about this -- I don't seem to see anything
encouraging form core devs, so I guess that's it.

Thanks for the fun bike-shedding...

-Chris





> Chris Barker writes:
>  > On Mon, Aug 11, 2014 at 11:07 PM, Stephen J. Turnbull <
> step...@xemacs.org>
>  > wrote:
>  >
>  > > I'm referring to removing the unnecessary information that there's a
>  > >  better way to do it, and simply raising an error (as in Python 3.2,
>  > > say) which is all a RealProgrammer[tm] should ever need!
>  > >
>  >
>  > I can't imagine anyone is suggesting that -- disallow it, but don't tell
>  > anyone why?
>
> As I said, it's a regression.  That's exactly the behavior in Python 3.2.
>
>  > The only thing that is remotely on the table here is:
>  >
>  > 1) remove the special case for strings -- buyer beware -- but consistent
>  > and less "ugly"
>
> It's only consistent if you believe that Python has strict rules for
> use of various operators.  It doesn't, except as far as they are
> constrained by precedence.  For example, I have an application where I
> add bytestrings bytewise modulo N <= 256, and concatenate them.  In
> fact I use function call syntax, but the obvious operator syntax is
> '+' for the bytewise addition, and '*' for the concatenation.
>
> It's not in the Zen, but I believe in the maxim "If it's worth doing,
> it's worth doing well."  So for me, 1) is out anyway.
>
>  > 2) add a special case for strings that is fast and efficient -- may be
> as
>  > simple as calling "".join() under the hood --no more code than the
>  > exception check.
>
> Sure, but what about all the other immutable containers with __add__
> methods?  What about mappings with key-wise __add__ methods whose
> values might be immutable but have __add__ methods?  Where do you stop
> with the special-casing?  I consider this far more complex and ugly
> than the simple "sum() is for numbers" rule (and even that is way too
> complex considering accuracy of summing floats).
>
>  > And I doubt anyone really is pushing for anything but (2)
>
> I know that, but I think it's the wrong solution to the problem (which
> is genuine IMO).  The right solution is something generic, possibly a
> __sum__ method.  The question is whether that leads to too much work
> to be worth it (eg, "homogeneous_iterable").
>
>  > > Because obviously we'd want the attractive nuisance of "if you
>  > > have __add__, there's a default definition of __sum__"
>  >
>  > now I'm confused -- isn't that exactly what we have now?
>
> Yes and my feeling (backed up by arguments that I admit may persuade
> nobody but myself) is that what we have now kinda sucks[tm].  It
> seemed like a good idea when I first saw it, but then, my apps don't
> scale to where the pain starts in my own usage.
>
>  > > It's possible that Python could provide some kind of feature that
>  > > would allow an optimized sum function for every type that has
>  > > __add__, but I think this will take a lot of thinking.
>  >
>  > does it need to be every type? As it is the common ones work fine
> already
>  > except for strings -- so if we add an optimized string sum() then we're
>  > done.
>
> I didn't say provide an optimized sum(), I said provide a feature
> enabling people who want to optimize sum() to do so.  So yes, it needs
> to be every type (the optional __sum__ method is a proof of concept,
> modulo it actually being implementable ;-).
>
>  > > *Somebody* will do it (I don't think anybody is +1 on restricting
>  > > sum() to a subset of types with __add__).
>  >
>  > uhm, that's exactly what we have now
>
> Exactly.  Who's arguing that the sum() we have now is a ticket to
> Paradise?  I'm just saying that there's probably somebody out there
> negative enough on the current situation to come up with an answer
> that I think is general enough (and I suspect that python-dev
> consensus is that demanding, too).
>
>  > sum() can be used for any type that has an __add__ defined.
>
> I'd like to see that be mutable types with __iadd__.
>
>  > What I fail to see is why it's better to raise an exception and
>  > point users to a better way, than to simply provide an optimization
>  > so that it's a mute issue.
>
> Because inefficient sum() is an attractive nuisance, easy to overlook,
> and likely to bite users other than the author.
>
>  > The only justification offered here is that will teach people that
> summing
>  > strings (and some other objects?)
>
> Summing tuples works (with appropriate start=tuple()).  Haven't
> benchmarked, but I bet that's O(N^2).
>
>  > is order(N^2) and a bad idea. But:
>  >
>  > a) Python's primary purpose is practical, not pedagogical (not that it
>  > isn't great for that)
>
> My argument is that in practical use sum() is a bad idea, period,
> until y

Re: [Python-Dev] sum() in math module not duplicated?

2008-06-19 Thread Raymond Hettinger

We're still working on the implementation details for math.sum().
When it's finished, the cmath equilvalent will be added.

Raymond

- Original Message - 
From: "A.M. Kuchling" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, June 19, 2008 7:16 PM
Subject: [Python-Dev] sum() in math module not duplicated?



In the comments before the implementation of sum()
in mathmodule.c, note 4 says:

  Note 4: A similar implementation is in Modules/cmathmodule.c.
  Be sure to update both when making changes.

I can't find a sum implementation or similar code in cmathmodule.c.
Did someone intend to port the sum function to cmath, or was it copied
and then taken out?  I'm wondering if this note should simply be
removed.

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation - temporary elision take 2

2014-08-11 Thread Julian Taylor
On 04.08.2014 22:22, Jim J. Jewett wrote:
> 
> 
> 
> Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in
> https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote:
> 
> 
>> Andrea Griffini  wrote:
> 
>>>However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
> 
>> hm could this be a pure python case that would profit from temporary
>> elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html 
>> ]?
> 
>> lists could declare the tp_can_elide slot and call list.extend on the
>> temporary during its tp_add slot instead of creating a new temporary.
>> extend/realloc can avoid the copy if there is free memory available
>> after the block.
> 
> Yes, with all the same problems.
> 
> When dealing with a complex object, how can you be sure that __add__
> won't need access to the original values during the entire computation?
> It works with matrix addition, but not with matric multiplication.
> Depending on the details of the implementation, it could even fail for
> a sort of sliding-neighbor addition similar to the original justification.

The c-extension object knows what its add slot does. An object that
cannot elide would simply always return 0 indicating to python to not
call the inplace variant.
E.g. the numpy __matmul__ operator would never tell python that it can
work inplace, but __add__ would (if the arguments allow it).

Though we may have found a way to do it without the direct help of
Python, but it involves reading and storing the current instruction of
the frame object to figure out if it is called directly from the
interpreter.
unfinished patch to numpy, see the can_elide_temp function:
https://github.com/numpy/numpy/pull/4322.diff
Probably not the best way as this is hardly intended Python C-API but
assuming there is no overlooked issue with this approach it could be a
good workaround for known good Python versions.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com