Steven D'Aprano, 20.01.2010 07:12: > On Wed, 20 Jan 2010 05:25:22 +0100, Alf P. Steinbach wrote: >> That is a good argument for not doing the expanding buffer thing. >> But such buffers may be generally present anyway, resulting from >> optimization of "+". > > As near as I can determine, the CPython optimization you are referring to > doesn't use the "double the buffer when needed" technique. It operates on > a completely different strategy. As near as I can tell (as a non-C > speaker), it re-sizes the string in place to the size actually needed, > thus reducing the amount of copying needed. > >> Using CPython 2.6.4 in Windows XP: > [snip time measurements] >> Here the linear increase of times indicate that the "+" is being >> optimized using an expanding buffer for the string. > > Be careful of jumping to the conclusion from timings that a certain > algorithm is used. All you can really tell is that, whatever the > algorithm is, it has such-and-such big-oh behaviour.
Which is particularly tricky here since the algorithms depends more on the OS than on the code in CPython. The better timings come from the fact that the OS does *not* need to copy the buffer on each iteration, but does smarter things when asked to enlarge the buffer. If you ran the benchmark on an OS that *did* copy the buffer each time, the runtime would really be quadratic. BTW, I think it would actually be worth trying to apply the same approach to str.join() if the argument is not a sequence (obviously followed by a benchmark on different platforms). Stefan -- http://mail.python.org/mailman/listinfo/python-list