Hello,

On Tue, 31 Mar 2020 04:27:04 +1100
Chris Angelico <ros...@gmail.com> wrote:

[]

> There's a vast difference between "mutable string" and "string
> builder". The OP was talking about this kind of thing:
> 
> buf = ""
> for i in range(50000):
>     buf += "foo"
> print(buf)
> 
> And then suggested using a StringIO for that purpose. But if you're
> going to change your API, just use a list:
> 
> buf = []
> for i in range(50000):
>     buf.append("foo")
> buf = "".join(buf)
> print(buf)

I appreciate expressing it all concisely and clearly. Then let me
respond here instead of the very first '"".join() rules!' reply I got.
The issue with "".join() is very obvious:

------
import io
import sys


def strio():
    sb = io.StringIO()
    for i in range(50000):
        sb.write(u"==%d==" % i)
    print(sys.getsizeof(sb) + sys.getsizeof(sb.getvalue()))

def listjoin():
    sb = []
    sz = 0
    for i in range(50000):
        v = u"==%d==" % i
        # All individual strings will be kept in the list and
        # can't be GCed before teh final join.
        sz += sys.getsizeof(v)
        sb.append(v)
    s = "".join(sb)
    sz += sys.getsizeof(sb)
    sz += sys.getsizeof(s)
    print(sz)

strio()
listjoin()
------

$ python3.6 memuse.py 
439083
3734325


So, it's obvious, but let's formulate it clearly for avoidance of
doubt:

There's absolutely no need why performing trivial operation of
accumulating string content should take about order of magnitude more
memory than actually needed for that string content. Don't get me wrong
- if you want to spend that much of your memory, then sure, you can. But
jumping with that as *the only right solution* whenever somebody
mentions "string concatenation" is a bit ... umm, cavalier.

> This is going to outperform anything based on StringIO fairly easily,

Since when raw speed is the only criterion for performance? If you say
"forever", I'll trust only if you proceed with showing assembly code
with SSE and AVX which you wrote to get those last cycles out.

Otherwise, being able to complete operations in reasonable amount of
memory, not OOM and not being DoSed by trivial means, and finally,
serving 8 times more requests in the same amount of memory - are alll
quite criteria too. 

What's interesting, that so far, the discussion almost 1-to-1 parallels
discussion in the 2006 thread I linked from the original mail.

> So if you really want a drop-in replacement, don't build it around
> StringIO, build it around list.
> 
> class StringBuilder:
>     def __init__(self): self.data = []
>     def __iadd__(self, s): self.data.append(s)
>     def __str__(self): return "".join(self.data)

But of course! And what's most important, nowhere did I talk what
should be inside this class. My whole concern is along 2 lines:

1. This StringBuilder class *could* be an existing io.StringIO.
2. By just adding __iadd__ operator to it.

That's it, nothing else. What's inside StringIO class is up to you (dear
various Python implementations, their maintainers, and contributors).
For example, fans of "".join() surely can have it inside. Actually,
it's a known fact that Python2's "StringIO" module (the original home
of StringIO class) was implemented exactly like that, so you can go
straight back to the future.


And again, the need for anything like that might be unclear for
CPython-only users. Such users can write a StringBuilder class like
above, or repeat the beautiful "".join() trick over and over again. The
need for a nice string builder class may occur only from the
consideration of the Python-as-a-language lacking a clear and nice
abstraction for it, and from thinking how to add such an abstraction in
a performant way (of which criteria are different) in as many
implementation as possible, in as easy as possible way. (At least
that's my path to it, I'm not sure if a different thought process might
lead to it too.)

-- 
Best regards,
 Paul                          mailto:pmis...@gmail.com
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KULQBPYYHF6LG46E2LJB2IW5EUFKFAKB/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to