New issue 1852: sum(arrays,[]) is slow
https://bitbucket.org/pypy/pypy/issue/1852/sum-arrays-is-slow

Anton Dubrau:

sum(strings,"") is very slow (unnecessarily so), but at least there's the 
"".join(strings) function.
when appending a list of arrays together in a similar fashion, there's no join 
method.

Looping over an array and growing a result array using extend is much faster, 
but still kind of slow (it appears it's superlinear).

The fastest way seems to be to pre-allocate a result array and copy the values 
into that, but it's kind of cumbersome.

I feel that sum(arrays,[]) (and incidentally sum(strings,"")) is idiomatic, and 
pypy should specialize on it. It can also easily avoid re-allocations in 
theory, compared to using extend.
Clearly, the time difference between the different approaches is unreasonable:

some profiling:

def sumExtend(arrays):
    result = []
    for a in arrays:
        result.extend(a)
    return result

def sumPreallocate(arrays):
    length = sum(len(a) for a in arrays)
    result = [0]*length
    index = 0;
    for a in arrays:
        result[index:index+len(a)] = a
        index += len(a)
    return result

>> arrays = [[i%10]*i for i in range(10000)]
>> %time result = sum(arrays,[])
cancelled after two and half hours ... may take days?

>> %time _ = sumPreallocate(arrays)
Wall time: 588 ms

>> %time _ = sumExtend(arrays)
Wall time: 1min 42s


_______________________________________________
pypy-issue mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-issue

Reply via email to