> On Apr 11, 2020, at 06:52, Paul Sokolovsky <pmis...@gmail.com> wrote:
> 
>> And a StringBuilder class would be another way.
> 
> StringBuilder would be just subset of functionality of StringIO, and
> would be hard to survive Occam's razor. (Mine is sharp to cut it off
> right away.)

I think _this_ is actually the root of the disagreement.

A StringBuilder that does one thing and does it well survives Occam’s razor in 
lots of other languages, like Java. Why? That one thing could be done by a 
mutable string object, or by a string stream object, so why not just pile it 
into one of those instead? Because piling it into one of those means you run 
into conflicting requirements, which force you to make hard tradeoffs, and 
possibly tradeoffs that are bad for other code, and possibly that break 
assumptions that existing other code has relied on for years.

Python’s StringIO is readable as well as writable. (If I have a library that 
wants a file object, and I have the data in memory, I just wrap it in a 
StringIO and now I have that file object. People use it for this all the time.) 
It also has a current position pointer, and can seek back to previously marked 
locations. It has optional newlines conversion. It has all the behavior that a 
file object has to have, and code relies on that fact, and that forces design 
decisions on you that may not be optimal for a StringBuilder.

It sounds like you already know the issues with mutable strings, so I won’t go 
over them here.

A stand-alone StringBuilder doesn’t have to do those things; it just has to 
append characters or strings to the end, and be able to give you a string when 
you’re done. So it can be optimal and at the same time dead simple. It can be 
nothing more than a dynamically-expanding array (or realloc buffer) of UCS4 
characters. Or, if you want to (usually) trade a bit of time for a lot of space 
savings, it can be a union of a dynamically-expanding array of UCS1/2/4 
characters (that has to reallocate and copy the first time you append an 
out-of-range character), but that’s still a whole lot simpler in a 
StringBuilder than in something that has to meet the str and PyUnicode APIs, or 
the file object APIs. Or you could design something more complicated if that 
turns out to work better. If any of these makes it hard to implement persistent 
seek positions that work even after you’ve reallocated, wastes overflow space 
when you’re using it just to read from an immutable input, etc., that would be 
completely irrelevant, because, unlike StringIO, nobody can ask a StringBuilder 
to do any of those things, so your design doesn’t have to support them.

Plus, looking beyond CPython, a new class can have whatever 
cross-implementation requirements we write into it. You can document that a 
StringBuilder doesn’t retain all of its input strings, but is at minimum 
roughly as efficient as making a list of strings and joining them anyway, and 
every Python implementation will do that (or just not implement the class at 
all, if they can’t, and document that fact, the reason why, and the recommended 
porting alternative very high up in a “differences from CPython” chapter), and 
any backport will too. You can’t document that about StringIO, because it would 
just be a lie for most existing implementations (including CPython 2.6-3.9, 
PyPy, etc.).

> I see, it's whole different concept for you. But as I mentioned,
> they're the same concept for me - both stream and buffer *are*
> protocols. And that's based on my desire to define Python as a generic
> programming language, based on a few consistent and powerful concepts.

Sure, buffers and streams are protocols, but they’re not the same protocol. A 
buffer is all about random access; a stream is not.

And file is a protocol too. There are even ABCs for it. It’s also not the same 
protocol as the simpler thing you’re thinking of as stream, of course, but it’s 
certainly a protocol.

And Python already is a generic language in your sense; most code is written 
around protocols like file and buffer and iterable and mapping and even number. 
Pythonic code, whenever possible, doesn’t care if I feed it a shelve instead of 
a dict, or a np.array of float64 instead of a float, or a StringIO instead of a 
TextIOWrapper around a FileIO. And people rely on that fact all the time. And 
you usually don’t even have to do anything special to make that true for your 
libraries.

Your real problem seems to be just that you wish Python were designed around a 
simpler stream protocol instead of the big and messy file protocol.

Maybe that would be better. File could be a subtype or wrapper, or maybe even a 
collection of them that could be composed as needed—you don’t always need 
seekability just because you need newline conversion, or vice versa. Java’s 
granular streams design is actually pretty handy at times (and I think it’s 
completely orthogonal to their horrible and verbose API around getting, 
building, and using streams). Then maybe OutputStringStream would just 
obviously be usable as a builder (which is almost, but not quite, true for 
C++). And there might be other benefits too. (We could also definitely have a 
cleaner API for things like socket.makefile, which today looks like a file but 
raises on many operations.)

But that’s not the language we have. And it still won’t be the language we have 
if you add an __iadd__ method to StringIO. Making StringIO not be a 
fully-featured and optimal-for-file-like-usage file object isn’t an option, 
because you can’t break all the code that depends on it. The only way to get 
there from here would be to design a complete new stream system and get the 
vast majority of the Python ecosystem to switch over to using it. Which is a 
pretty huge ask. (And it still won’t let you just add __iadd__ to StringIO; 
it’ll only let you add __iadd__ to that new OutputStringStream.)

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TGEDOUOHSWPO53V3GYQ2PPTMVEWRZ4GV/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to