On Tue, Mar 31, 2020 at 12:21 PM Paul Sokolovsky <pmis...@gmail.com> wrote:

> Christopher Barker <python...@gmail.com> wrote:
> For avoidance of doubt: nothing in my RFC has anything to do, or
> implies, "a mutable string type".


I said "there are some use cases for a mutable string type" I did not say
that's what was asked for in this thread.

So why did I say that? because:


> A well-know pattern of string
> builder, yes.


As I read this suggestion, it starred with something like:

* lots of people use the a "pattern of string building", using str +=
another_string to build up strings.
* That is not an efficient pattern, and is considered an anti-pattern, even
in cPython, where is has been cleverly optimized.

I think everyone on this thread would agree with the above.

* The "official recommended solution" is another pattern: build up in the
list, and then join it.

You are suggesting that it would nice if there were an efficient
implementation of string building that followed the original anti-pattern's
syntax. After all, if folks want to make a string, then using familiar
string syntax would be nice and natural.

You've pointed out that StringIO already provides an efficient
implementation of string building (which could be made even more efficient,
if one wanted to write that code) .

And that if it grew an __iadd__ method, it would then match the pattern
that you want it to match, and allow folks to improve their code with less
change than going to the list.append then join method.

All good.

But what struck me is that in the end, this is perhaps a more friendly than
the list-based method, but it's still a real shift in paradigm: I think
people use str +=str not because they are thinking "I need a string
builder", but because they are thinking: I need a "string". That is your
choice of variable names:

buf = ""
for i in range(50000):
    buf += "foo"
print(buf)

is not what most folks would use, because they aren't thinking "I need a
buffer in which to put a bunch of strings", they are thinking: "I need to
make this big string", so would more likely write:

message = "The start of the message"
for i in something:
    buf += "some more message"
do_something_with_the_message(message)

which, yes, is almost exactly the same as your example, but with a
different intent -- I start with a string and make it bigger, not "I make a
buffer in which to build a string, and then put things in it, then get the
resulting string out of the buffer.

I teach a lot of beginners, so yes, I do see this code pattern a fair bit.

The difference in intent means that folks are not likely to go looking for
a "buffer" or "string builder" anyway.

So that suggested to me that a mutable string type would completely satisfy
your use case, but be more natural to folks used to strings:

message = MutableString("The start of the message")
for i in something:
    buf += "some more message"
do_something_with_the_message(message)

And you could do other nifty things with it, like all the string methods,
without a lot of wasteful reallocating, particularly for methods that don't
change the length of the string. (Though Unicode does make this a
challenge!) (and yes, I know, that the "wasteful reallocating" is probably
hardly ever, if ever, a bottleneck)

In short: a mutable string would satisfy the requirements of a "string
builder", and more.

Anyway, as I said in my previous message, the fact that a Mutable string
hasn't gained any traction tells us something: it really isn't that
important.

And I mentioned a similar effort I made to make a growable numpy array,
and, well, it turned out not to be worth it either.

However if we're all wrong, and there would be a demand for such a "string
builder", then why not write one (could be a wrapper around StringIO if you
want), and put it on PyPi, or even just for own lib, and see how it catches
on.

Have you done that for your own code and found you like it enough to really
want to push this forward?

BTW: I timed += vs StringIO, vs list + join, and found (like you did) that
they are all about the same speed, under cPython 3.7. But I had a thought
-- might string interning be affecting the performance? particularly for
the list method:

In [43]: def list_join():
    ...:     buf = []
    ...:     for i in range(10000):
    ...:         buf.append("foo")
    ...:     return "".join(buf)

note that that is only making one string "foo", and reusing it in all items
in the list. In the common case, you wouldn't get that help.

OK, tested it, no it doesn't really make a difference. If you replace "foo"
(which gets interned) with "foo "[:3] (which doesn't), they all take
longer, but still all about the same.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FC54PZERYRX37MQSCUALB5UAGJOYDFRB/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to