[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Chris Angelico Thu, 02 Apr 2020 00:28:29 -0700

On Thu, Apr 2, 2020 at 5:58 PM Steven D'Aprano <[email protected]> wrote:
>
> On Thu, Apr 02, 2020 at 03:37:34PM +1100, Chris Angelico wrote:
> > On Thu, Apr 2, 2020 at 3:27 PM Steven D'Aprano <[email protected]> wrote:
> > > Or maybe that's just an argument that no solution is going to solve
> > > *every* problem. What do we do about people who write this:
> > >
> > >     buf = f'{buf}{substring}'
> > >
> > > inside a loop? We can't fix everyone's code with one change.
> > >
> >
> > I don't know whether your point was that this is bad code and can't be
> > optimized, or that it's good code but still can't be optimized by this
> > proposal.
>
> Neither. It's that anyone building a string like that isn't the target
> of this proposal. I could have given various examples, I just happened
> to pick f-string:
>
>
>     buf = '%s%s' % (buf, substring)
>     buf = '{}{}'.format(buf, substring)
>     buf = ''.join(itertools.chain(buf, substring))
>
> Put any of them into a loop, and they are likely to be exceedingly slow
> for large N. But fixing them is not part of Paul's proposal.


And they'd all be equivalent to the consideration I'm talking about
(it's not f-string specific).

> > But if the former, then I put it to you that this isn't
> > actually bad code.
>
> Repeated string concatenation doesn't suddenly become efficient just
> because you wave a magic f-string at it. That appends substring to the
> buffer each time through the loop, giving quadratic performance.

This is true; however, they're all optimizable in different ways.
CPython happens to have an optimization for the += case, but it's
equally possible for some other form to have an optimization.

> On my computer, appending a single character 'a' to the buffer each
> time, I get:
>
> 10_000 loops, 0.4 seconds  (actual time)
> 50_000 loops, 9.8 seconds  # expect 5*0.4 = 2 seconds
> 100_000 loops 39 seconds   # expect 10*0.4 = 4 seconds
> 200_000 loops 160 seconds  # expect 20*0.4 = 8 seconds
>
>
> The expected times are assuming that the time is proportional to the
> number of elements, i.e. O(N). If we assume O(N**2) then the expected
> times would be 10, 40 and 160 seconds, so this is a textbook example of
> quadratic slowdown.
>
> (At least on my computer.)
>

Yep. Because CPython doesn't optimize any of them. So it's very
interpreter-specific to take advantage of +=, and would be identically
interpreter-specific to take advantage of any other form. This
proposal continues to favour the += spelling by making it higher
performing.

ChrisA
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/MRHSZ5LGITM5K4VW4D6Y3IF6TXCLTPMK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Reply via email to