On Mar 30, 2020, at 22:03, Steven D'Aprano <st...@pearwood.info> wrote:
> 
> On Mon, Mar 30, 2020 at 01:59:42PM -0700, Andrew Barnert via Python-ideas 
> wrote:
> 
> [...]
>> When you call getvalue() it then builds a Py_UCS4*
>> representation that’s in this case 4x the size of the final string 
>> (since your string is pure ASCII and will be stored in UCS1, not 
>> UCS4). And then there’s the final string.
>> 
>> So, if this memory issue makes join unacceptable, it makes your 
>> optimization even more unacceptable.
> 
> You seem to be talking about a transient spike in memory usage, as the 
> UCS4 string is built then disposed of. Paul seems to be talking about 
> holding on to large numbers of substrings for long periods of time, 
> possibly minutes or hours or even days in the case of a long running 
> process.

But StringIO has the same long-term cost of the list, _plus_ a transient spike. 
There’s no way that can be better than just the same long-term cost. You can 
try to argue that it’s not that much worse, or that it isn’t worse in some 
cases, or that it could be optimized to not be as much worse; I’ll snip our all 
of those arguments because even if you’re right, it’s still not better. So this 
proposal amounts to changing Python, so that we can then get everyone to stop 
using the idiom they’ve been using for decades and use a different one, just to 
get maybe at best the same performance they already have. Why does that sound 
reasonable to you?

> Whether StringIO takes advantage of that opportunity *right now* or not 
> is, in a sense, irrelevent. It's an opportunity that lists don't have. 
> Any (potential) inefficiency in StringIO could be improved, but it's 
> baked into the design of lists that it *must* keep each string as a 
> separate object.

The reason StringIO keeps a list (well, a C struct that’s almost the same thing 
as a list) is because it’s fast. It’s not the simplest implementation, it’s 
something that people put a lot of work into optimizing. 

Is it possible that someone could come up with something that’s even better for 
the main uses of StringiO (simulating a file) , and that also happens to be 
good for use as a string builder? Sure, I suppose it’s possible. But do you 
really think we should mame a change just so we can encourage people to switch 
to using something that’s slower and takes more memory (and doesn’t work in 
older versions of Python) just because it’s not impossible that one day someone 
will come up with a new optimization that makes it better instead of worse?

> And if some specific implementation happens to have a particularly 
> inefficient StringIO, that's a matter of quality of implementation and 
> something for the users of that specific interpreter to take up with its 
> maintainers. It's not a reason for use to reject Paul's proposal.

But if every implementation of StrjngIO, in every interpreter, is actually 
worse than joining lists, isn’t that a reason for us to reject the proposal?

>> And thinking about portable code makes it even worse. Your code might 
>> be run under CPython and take even more memory, or it might be run 
>> under a different Python implementation where StringIO is not 
>> accelerated (where it’s just a TextIOWrapper around a BytesIO) and 
>> therefore be a whole lot slower instead.
> 
> So wait, let me see if I understand your argument:
> 
> 1. CPython's string concatentation is absolutely fine, even though it is 
> demonstrably slower on 11 out of the 12 interpreters that Paul tested.

No. This is no part of my argument. The recommended way to handle building 
large strings out of lots of little strings is, and always has been, to join a 
list. It’s in the FAQ. It’s even baked into the code of CPython (see the error 
message from calling sum on strings). People should not be concatenating 
strings, but we don’t need to offer them a better solution because they already 
have a better solution.

> 2. The mere possibility of even a single hypothetical Python interpreter 
> that has a slow and unoptimized StringIO buffer is enough to count 
> against Paul's proposal.

No, the fact of every real life Python interpreter having a StringIO that’s at 
least a little worse than string join, and in some cases a lot worse, is enough 
to rule out the proposal. (The facts that StringIO also has the wrong semantics 
 is less obvious for the purpose, and isn’t a decades-long established idiom 
are additional problems with the proposal. And the biggest problem is that the 
proposal is trying to fix a problem that doesn’t exist in the first place.)

> Is that correct, or have I missed some nuance to your defence of string 
> concatenation and rejection of Paul's proposal?

You haven’t missed any nuance, you’ve missed the entire point. I am not 
defending string concatenation, I’m defending the established idiom of join. I 
am not arguing to reject Paul’s proposal because it might theoretically be 
inefficient on some implementation, but because it definitely is inefficient on 
every existing implementation. And because it’s wrong to boot, and because it 
doesn’t solve any actual problem.

>> So it has to be able to deal 
>> with both of those possibilities, not just one; code that uses the 
>> usual idiom, on the other hand, behaves pretty similarly on all 
>> implementations.
> 
> The "usual idiom" being discussed here is repeated string concatenation, 

No it isn’t. The usual idiom is join.

It’s true that there are some people who never read the docs, never search 
StackOverflow or Python-list, never talk to other developers, etc., and abuse 
string concatenation. But giving them a second idiom isn’t going to change 
that—they’re still not going to read the docs, etc. We could give them 30 
better ways to do it, and that won’t be any better than giving them 1 way.

>>> My whole concern is along 2 lines:
>>> 
>>> 1. This StringBuilder class *could* be an existing io.StringIO.
>>> 2. By just adding __iadd__ operator to it.
>> 
>> No, it really couldn’t. The semantics are wrong (unless you want, say, 
>> universal newline handling in your string builder?),
> 
> Ah, now *that* is a good point.
> 
>> it’s optimized for a different use case than string building,
> 
> It is? That's odd. The whole purpose of StringIO is to build strings.
> 
> What use-case do you believe it is optimized for?

Guiro already answered this; but let me ask a followup question: Why would you 
think a class that’s in the io module, that implements the text file ABC (and 
doesn’t implement a string-builder API, hence Paul’s proposal), and that’s 
documented as a way to be “an in-memory stream for text I/O” would be optimized 
for use as a string builder instead of for use as an in-memory file object?

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UY2RRSPLRLLG7ILAPRXGEHLIUER34OYE/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to