[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 10:08:06PM -0700, Guido van Rossum wrote:
> On Mon, Mar 30, 2020 at 10:00 PM Steven D'Aprano 
> wrote:
> 
> > > it’s optimized for a different use case than string building,
> >
> > It is? That's odd. The whole purpose of StringIO is to build strings.

I misspoke: it is not the *whole* purpose. See below.


> > What use-case do you believe it is optimized for?
> >
> 
> Let me tell you, since I was there.
> 
> StringIO was created in order to fit code designed to a file, where all you
> want to do is capture its output and process it further, in the same
> process. (Or vice versa for the reading case of course.) IOW its *primary*
> feature is that it is a duck type for a file, and that is what it's
> optimized for. Also note that it only applies to use cases where the data
> does, indeed, fit in the process's memory somewhat easily -- else you
> should probably use a temporary file. If the filesystem were fast enough
> and temporary files were easier to use we wouldn't have needed it.

But it does that by *building a string*, does it not? That's what the 
getvalue() method is for.

Perhaps we're talking past each other. I'm aware that the purpose of 
StringIO is to offer a file-like API with an in-memory object that 
doesn't require external storage on the file system. (Hence my 
retraction above about "whole purpose".)

But it still has to do this by returning a string. (In the case of 
writing to a StringIO object, obviously.)



-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EIZYFHFYAGQNM6I7MAJO333UHQKICEBQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Tue, Mar 31, 2020 at 03:01:51PM +1100, Steven D'Aprano wrote:

> > nor the fastest way 
> 
> It's pretty close though.
> 
> On my test, accumulating 500,000 strings into a list versus a StringIO 
> buffer, then building a string, took 27.5 versus 31.6 ms. Using a string 
> took 36.4 ms. So it's faster than the optimized string concat, and 
> within arm's reach of list+join.

I re-ran the test with a single non-ASCII character added to the very 
end, '\U0001D400'. Both the list and the StringIO versions slowed down 
by about the same amount of time (approx 4ms) so the difference between 
them remained the same in absolute terms but shrank marginally in 
relative terms. YMMV.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2G64USGBXSNXTOEQRNQV6IVFZCNMXGG6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Guido van Rossum
On Mon, Mar 30, 2020 at 10:00 PM Steven D'Aprano 
wrote:

> > it’s optimized for a different use case than string building,
>
> It is? That's odd. The whole purpose of StringIO is to build strings.
>
> What use-case do you believe it is optimized for?
>

Let me tell you, since I was there.

StringIO was created in order to fit code designed to a file, where all you
want to do is capture its output and process it further, in the same
process. (Or vice versa for the reading case of course.) IOW its *primary*
feature is that it is a duck type for a file, and that is what it's
optimized for. Also note that it only applies to use cases where the data
does, indeed, fit in the process's memory somewhat easily -- else you
should probably use a temporary file. If the filesystem were fast enough
and temporary files were easier to use we wouldn't have needed it.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YNMIUHNCVNLD5A2N2C4GOZBT6O7CZEM5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 01:59:42PM -0700, Andrew Barnert via Python-ideas wrote:

[...]
> When you call getvalue() it then builds a Py_UCS4* 
> representation that’s in this case 4x the size of the final string 
> (since your string is pure ASCII and will be stored in UCS1, not 
> UCS4). And then there’s the final string.
> 
> So, if this memory issue makes join unacceptable, it makes your 
> optimization even more unacceptable.

You seem to be talking about a transient spike in memory usage, as the 
UCS4 string is built then disposed of. Paul seems to be talking about 
holding on to large numbers of substrings for long periods of time, 
possibly minutes or hours or even days in the case of a long running 
process.

If StringIO.getvalue() builds an unnecessary UCS4 string, that's an 
obvious opportunity for optimization. Regardless of whether people use 
StringIO by calling the write() method or Paul's proposed `+=` this 
optimization might still be useful. 

In any case, throw in one emoji into your buffer, just one, and the 
whole point becomes moot. Whether you are using StringIO or list.append 
plus join, you still end up with a UCS4 string at the end.
 
I don't understand the CPython implementation very well, I barely know 
any C at all, but your argument seems a bit dubious to me. Regardless of 
the implementation, if you accumulate N code points, it takes a minimum 
of N by the width of a code point to store that buffer. With a StringIO 
buffer, there is at least the opportunity to keep them all in a single 
buffer with minimal overhead:

buf --> []  # four code points, each of 4 bytes in UCS4

With a list, you have significantly more overhead. For the sake of 
discussion, let's say you build it from four one-character strings.

lst --> []  # four pointers to str objects

Each pointer will take eight bytes on modern 64-bit systems, so that's 
already double the size of buf. Then there is the object overhead of the 
four strings, which is *particularly* acute for single ASCII chars. 50 
bytes for a one byte ASCII char. So in the worst case, every char you 
add to your buffer takes 58 bytes in a list versus 4 for a StringIO that 
uses UCS4 internally.

Whether StringIO takes advantage of that opportunity *right now* or not 
is, in a sense, irrelevent. It's an opportunity that lists don't have. 
Any (potential) inefficiency in StringIO could be improved, but it's 
baked into the design of lists that it *must* keep each string as a 
separate object.

Of course there are only 128 unique ASCII characters, and interning 
reduces some of that overhead. But even in the best case where you are 
appending large strings there's always going to be more memory overhead 
in a list that a buffer has the opportunity to avoid.

And if some specific implementation happens to have a particularly 
inefficient StringIO, that's a matter of quality of implementation and 
something for the users of that specific interpreter to take up with its 
maintainers. It's not a reason for use to reject Paul's proposal.


> And thinking about portable code makes it even worse. Your code might 
> be run under CPython and take even more memory, or it might be run 
> under a different Python implementation where StringIO is not 
> accelerated (where it’s just a TextIOWrapper around a BytesIO) and 
> therefore be a whole lot slower instead.

So wait, let me see if I understand your argument:

1. CPython's string concatentation is absolutely fine, even though it is 
demonstrably slower on 11 out of the 12 interpreters that Paul tested.

2. The mere possibility of even a single hypothetical Python interpreter 
that has a slow and unoptimized StringIO buffer is enough to count 
against Paul's proposal.

Is that correct, or have I missed some nuance to your defence of string 
concatenation and rejection of Paul's proposal?


> So it has to be able to deal 
> with both of those possibilities, not just one; code that uses the 
> usual idiom, on the other hand, behaves pretty similarly on all 
> implementations.

The "usual idiom" being discussed here is repeated string concatenation, 
which certainly does not behave similarly on all implementations. 
Unless, of course, you're referring to it performing *really poorly* on 
all implementations except CPython.


> > My whole concern is along 2 lines:
> > 
> > 1. This StringBuilder class *could* be an existing io.StringIO.
> > 2. By just adding __iadd__ operator to it.
> 
> No, it really couldn’t. The semantics are wrong (unless you want, say, 
> universal newline handling in your string builder?),

Ah, now *that* is a good point.

> it’s optimized for a different use case than string building,

It is? That's odd. The whole purpose of StringIO is to build strings.

What use-case do you believe it is optimized for?


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to 

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 04:25:07PM -0700, Christopher Barker wrote:
> As others have pointed out, the OP started in a  bit of an oblique way, but
> it maybe come down to this:
> 
> There are some use-cases for a mutable string type. And one could certainly
> write one.

With respect Christopher, this is a gross misrepresentation of what Paul 
has asked for. He is not asking for a mutable string type. If that isn't 
clear from the subject line of this thread, it ought to be clear from 
Paul's well-written and detailed post, which carefully explains what he 
wants.


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2ZS7KSUXWOK6NWTOUPTD4LRUB4F3PKFJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 10:07:30AM -0700, Andrew Barnert via Python-ideas wrote:

> Why? What’s the benefit of building a mutable string around a virtual 
> file object wrapped around a buffer (with all the extra complexities 
> and performance costs that involves, like incremental Unicode encoding 
> and decoding) instead of just building it around a buffer directly?

The quote about adding another abstraction layer solving every problem
except the problem of having too many abstraction layers comes to mind.

But let's please not hijack this proposal by making it about a full- 
blown mutable string object. Paul's proposal is simple: add `+=` as an 
alias to `.write` to StringIO and BytesIO.

We have the str concat optimization to cater for people who want to 
concatenate strings using `buf += str`. You are absolutely right that 
the correct cross-platform way of doing it is to accumulate a list then 
join it, but that's an idiom that doesn't come easily to many people. 
Hence even people who know better sometimes prefer the `buf += str` 
idiom, and hence the repeated arguments about making join a list method.

(But you must accumulate the list with append, not with list 
concatenation, or you are back to quadratic behaviour.)

It seems to me that the least invasive change to write efficient, good 
looking code is Paul's suggestion to use StringIO or BytesIO with the 
proposed `+=` operator. Side by side:

# best read using a fixed-width font
buf = ''buf = []  buf = io.StringIO()
for s in strings:   for s in strings: for s in strings:
buf += sbuf.append(s) buf += s
buf = ''.join(buf)buf = buf.getvalue()

Clearly the first is prettiest, which is why people use it. (It goes 
without saying that *pretty* is a matter of opinion.) It needs no extra 
conversion at the end, which is nice. But it's not cross-platform, and 
even in CPython it's a bit risky.

The middle is the most correct, but honestly, it's not that pretty. Many 
people *really* hate the fact that join is a string method and would 
rather write `buf.join('')`.

The third is, in my opinion, quite nice. With the status quo 
`buf.write(s)`, it's much less nice.

Paul's point about refactoring should be treated more seriously. If you 
have code that currently has a bunch of `buf += s` scattered around in 
many places, changing to the middle idiom is difficult:

1. you have to change the buffer initialisation;
2. you have to add an extra conversion to the end;
3. and you have to change every single `buf += s` to `buf.append(s)`.

With Paul's proposal, 1 and 2 still apply, but that's just two lines. 
Three if you include the `import io`. But step 3 is gone. You don't have 
to change any of the buffer concatenations to appends.

Now that's not such a big deal when all of the concatenations are right 
there in one little loop, but if they are scattered around dozens of 
methods or functions it can be a significant refactoring step.

> More generally, a StringIO is neither the obvious way 

If I were new to Python, and wanted to build a string, and knew that 
repeated concatenation was slow, I'd probably look for some sort of 
String Builder or String IO class before thinking of *list append*. 
Especially if I came from a Java background.

> nor the fastest way 

It's pretty close though.

On my test, accumulating 500,000 strings into a list versus a StringIO 
buffer, then building a string, took 27.5 versus 31.6 ms. Using a string 
took 36.4 ms. So it's faster than the optimized string concat, and 
within arm's reach of list+join.

Replacing buf.write with `+=` might, theoretically, shave off a bit of 
the overhead of attribute lookup. That would close the distance a 
fraction. And maybe there are other future optimizations that could 
follow. Or maybe not.


> nor the recommended way to build strings on the fly in Python, so 
> why do you agree with the OP that we need to make it better for that 
> purpose? Just to benefit people who want to write C++ instead of 
> Python?

If writing `buf += s` is writing C++ instead of Python, then you have 
spent much of this thread defending the optimization added in version 
2.4 to allow people to write C++ instead of Python. So why are you 
suddenly against it now when the underlying buffer changes from str to 
StringIO?

When I was younger and still smarting from being on the losing side of 
the Pascal vs C holy wars, I really hated the idea of adding `+=` to 
Python because it would encourage people to write C instead of Python. I 
got over it :-)


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FMY6HEBW4A7AVQDDADNMMTIS66TP5CDB/
Code of 

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread David Mertz
I have myself been "guilty" of using the problem style for N < 10.  In
fact, I had forgotten about the optimization even, since my uses are
negligible time.

For stuff like this, it's fast no matter what:

for clause in query_clauses:
sql += clause

Maybe I have a WHERE or two.  Maybe an ORDER BY.  Etc.  But if I'm sure
there won't be more than 6 such clauses to the query I'm building, so what?
Or probably likewise with bits of a file path, or a URL with optional
parameters, and a few other things.

On Mon, Mar 30, 2020 at 11:15 PM David Mertz  wrote:

> Does anyone know if any linters find and warn about the `string += word`
> in a loop pattern? It feels like a linter would be the place to do that.  I
> don't think we could possibly make it an actual interpreter warning given
> borderline OK uses (or possibly even preferred ones).  But a little nagging
> in tooling could draw attention.
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JBHFZIONF2U3MJKICM6AEJFEPZ4UGRUM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread David Mertz
Does anyone know if any linters find and warn about the `string += word` in
a loop pattern? It feels like a linter would be the place to do that.  I
don't think we could possibly make it an actual interpreter warning given
borderline OK uses (or possibly even preferred ones).  But a little nagging
in tooling could draw attention.



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BQHEM3BSKSVBHYTYJIYMQJ5GREXXRM4F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 12:37:48PM -0700, Andrew Barnert via Python-ideas wrote:
> On Mar 30, 2020, at 12:00, Paul Sokolovsky  wrote:
> > Roughly speaking, to support efficient appending, one need to
> > be ready to over-allocate string storage, and maintain bookkeeping for
> > this. Another known optimization CPython does is for stuff like "s =
> > s[off:]", which requires maintaining another "offset" pointer. Even
> > with this simplistic consideration, internal structure of "str" would
> > be about the same as "io.StringIO" (which also needs to over-allocate
> > and maintain "current offset" pointer). But why, if there's io.StringIO
> > in the first place?
> 
> Because io.StringIO does _not_ need to do that.

The same comment can be made that str does not need to implement the 
in-place concat optimization either. And yet it does, in CPython if not 
any other interpreter.

It seems to me that Paul makes a good case that, unlike the string 
concat optimization, just about every interpreter could add this to 
StringIO without difficulty or great cost. Perhaps they could even get 
together and agree to all do so.

But unless CPython does so too, it won't do them much good, because 
hardly anyone will take advantage of it. When one platform dominates 90% 
of the ecosystem, one can sensibly write code that depends on that 
platform's specific optimizations, but going the other way, not so much.

The question that comes to my mind is not whether StringIO *needs* to do 
this, but whether there is any significant cost to doing this?

Of course there is *some* cost: somebody has to do the work, and it 
won't be me. But once done, is there any significant maintenance cost 
beyond what there would be without it? Is there any downside?


[...]
> And it doesn’t allow you to do random-access seeks to arbitrary 
> character positions.

Sorry, I don't see why random access to arbitrary positions is relevant 
to a discussion about concatenation. What am I missing?


-- 
Steven
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5YFFTUKDTT7YCE7YTEZFXHG2ATVNBEN5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Steven D'Aprano
On Mon, Mar 30, 2020 at 10:24:02AM -0700, Andrew Barnert via Python-ideas wrote:
> On Mar 30, 2020, at 10:01, Brett Cannon  wrote:

[talking about string concatenation]
 
> > I don't think characterizing this as a "mis-optimization" is fair. 
[...]
> Yes. A big part of the reason there’s so much use in the wild is that 
> for small cases that aren’t in the middle of a bottleneck, it’s 
> perfectly reasonable for people to add two or three strings and not 
> care about performance. (Who cares about N**2 when N<=15 and it 
> happens at most 4 times per run of your program?) 

When you're talking about N that small (2 or 4, say), it is quite 
possible that the overhead of constructing a list then looking up and 
calling a method may be greater than that of string concatenation, even 
without the optimization. I wouldn't want to bet either way without 
benchmarks, and I wouldn't trust the benchmarks from one machine to 
apply to another.


> So people do it, and 
> it’s fine. When they really do need to optimize, a quick search of the 
> FAQ or StackOverflow or whatever will tell them the right way to do 
> it, and they do it, but most of the time it doesn’t matter.

Ah, but that's the rub. How often do they know they need to do that 
"quick search"? Unless they get bitten by poor performance, and spend 
the time to profile their script and discover the cause of the slow 
down, how would they know what the cause was?

If people already know about the string concatenation trap, they don't 
need a quick search, and they're probably not writing repeated 
concatenation for arbitrary N in the first place.

Although I have come across a few people who are completely dismissive 
of the idea of using cross-platform best practices. Even actively 
hostile to the idea that they should avoid idioms that will perform 
badly on other interpreters.

On the third hand, if they don't know about the trap, then it won't be a 
quick search because they don't know what to search for (unless it's 
"why is Python so slow?" which won't be helpful).



Disclaimer: intellectually, I like the CPython string concatenation 
optimization. It's clever, a Neat Hack, I really admire it. But I can't 
help feeling that, *just maybe*, it's a misplaced optimization, and if 
it were proposed today when we are more concerned about alternative 
interpreters, we might not have accepted it.

Perhaps if CPython didn't dominate the ecosystem so completely, and more 
people wrote cross-platform code that was run across multiple 
interpreters, we wouldn't be quite so keen on an optimization that 
encourages quadratic behaviour half the time. So even though I don't 
*quite* agree with Paul, I can see that from the perspective of people 
using alternate interpreters, this CPython optimization could easily be 
characterized as a mis-optimization.

"Why is CPython encouraging people to use an idiom that is all but 
guaranteed to be hideously slow on everyone else's interpreter?"

Since Brett brought up the notion of fairness, one might even be 
forgiven for considering that such an optimization in the reference 
interpreter, knowing that most of the other interpreters cannot match 
it, is an unfair, aggressive, anti-competitive action.

Personally I wouldn't go quite so far. But I can see why people who are 
passionate about alternate interpeters might feel that this optimization 
is both harmful and unfair on the greater Python ecosystem.

Apart from cross-platform issues, another risk with the concat 
optimization is that it's quite fragile and sensitive to the exact form 
of your code. A small, seemingly insignificant change to your code can 
have enormous consequences:

In [1]: strings = ['abc']*50

In [2]: %%timeit
   ...: s = ''
   ...: for x in strings:
   ...: s = s+x
   ...:
36.4 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %%timeit
   ...: s = ''
   ...: for x in strings:
   ...: s = t = s+x
   ...:
59.7 s ± 799 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's more than a thousand times slower.

And I think people often underestimate how painful it can be to debug 
performance problems caused by this. If you haven't been burned by it 
before, it may not be obvious just how risky repeated concatenation can 
be. Here is an example from real life.

In 2009, about four years after the in-place string concatenation 
optimization was added to CPython, Chris Withers asked for help 
debugging a problem where Python httplib was literally hundreds of times 
slower than other tools, like wget and Internet Explorer:

https://mail.python.org/pipermail/python-dev/2009-August/091125.html

A few weeks later, Simon Cross realised the problem was probably the 
quadratic behaviour of repeated string addition:

https://mail.python.org/pipermail/python-dev/2009-September/091582.html

leading to this quote from Antoine Pitrou:

"Given differences between platforms in realloc() performance, it might 
be the reason why 

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Chris Angelico
On Tue, Mar 31, 2020 at 10:25 AM Christopher Barker  wrote:
>
> As others have pointed out, the OP started in a  bit of an oblique way, but 
> it maybe come down to this:
>
> There are some use-cases for a mutable string type. And one could certainly 
> write one.
>
> presto: here is one:
>
> https://github.com/Daniil-Kost/mutable_strings
>
> Which looks to me to be more a toy than anything, but maybe the author is 
> seriously using it... (it does look like it has a bug indexing if there are  
> non-ascii)
>
> And yet, as far as I know, there has never been one that was carefully 
> written and optimized, which would be a bit of a trick, because of how Python 
> strings handle Unicode. (it would have been a lot easier with Python2 :-) )
>

You mean, it's a lot easier to write bytearray? :)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3DU6HB26CJRHQRZGJ73EBNN3ME3UZT6S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Christopher Barker
As others have pointed out, the OP started in a  bit of an oblique way, but
it maybe come down to this:

There are some use-cases for a mutable string type. And one could certainly
write one.

presto: here is one:

https://github.com/Daniil-Kost/mutable_strings

Which looks to me to be more a toy than anything, but maybe the author is
seriously using it... (it does look like it has a bug indexing if there
are  non-ascii)

And yet, as far as I know, there has never been one that was carefully
written and optimized, which would be a bit of a trick, because of how
Python strings handle Unicode. (it would have been a lot easier with
Python2 :-) )

So why not?

1) As pointed out, high performance strings are key to a lot of coding, so
Python's str is very baked-in to a LOT of code, and can't be duck-typed. I
know that pretty much the only time I ever type check (as apposed to simple
duck typing EAFTP) is for str. So if one were to make a mutable string
type, you'd have to convert it to a string a lot in order to use most other
libraries.

That being said, one could write a mutable string that mirrored' the
cPython string types as much as possible, and it could be pretty efficient,
even for making regular strings out of it.

2) Maybe it's really not that useful. Other than building up a long string
with a bunch of small ones (which can be done fine with .join())  , I'm not
sure I've had much of a use case -- it would buy you a tiny bit of
performance for, say, altering strings in ways that don't change their
length, but I doubt there's many (if any) applications that would see any
meaningful benefit from that.

So I'd say it hasn't been done because (1) it's a lot of work and (2) it
would be a bit of a pain to use, and not gain much at all.

A kind-of-related anecdote:

numpy arrays are mutable, but you can not change their length in place. So,
similar with strings, if you want to build up an array with a lot of little
pieces, then the best way is to put all the pieces in a list, and then make
an array out of it when you are done.

I had a need to do that fairly often (reading data from files of unknown
size) so I actually took the time to write an array that could be extended.

Turns out that:

1) it really wasn't much faster (than using a list) in the usual use-cases
anyway :-)
2) it did save memory -- which only mattered for monster arrays, and I'd
likely need to do something smarter anyway in those cases.

I even took some time to write a Cython-optimized version, which only
helped a little. I offered it up to the numpy community.

But in the end: no one expressed much interest. And I haven't used it
myself for anything in a long while.

Moral of the story: not much point in a special class to do something that
can already be done almost as well with the builtins.

-CHB






On Mon, Mar 30, 2020 at 2:06 PM Paul Sokolovsky  wrote:

> Hello,
>
> On Tue, 31 Mar 2020 07:40:01 +1100
> Chris Angelico  wrote:
>
> > On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky 
> > wrote:
> > > for i in range(5):
> > > v = u"==%d==" % i
> > > # All individual strings will be kept in the list and
> > > # can't be GCed before teh final join.
> > > sz += sys.getsizeof(v)
> > > sb.append(v)
> > > s = "".join(sb)
> > > sz += sys.getsizeof(sb)
> > > sz += sys.getsizeof(s)
> > > print(sz)
> > >
> >
> > > ... about order of magnitude more memory ...
> >
> > I suspect you may be multiply-counting some of your usage here. Rather
> > than this, it would be more reliable to use the resident set size (on
> > platforms where you can query that).
>
> I may humbly suggest a different process too: get any hardware
> board with MicroPython and see how much data you can collect in a
> StringIO and in a list of strings. Well, you actually don't need a
> dedicated hardware, just get a Linux or Windows version and run it
> with a specific heap size using a -X heapsize= switch, e.g. -X
> heapsize=100K.
>
> Please don't stop there, we talk multiple implementations, try it on
> CPython too. There must be a similar option there (because how
> otherwise you can perform any memory-related testing!), I just forgot
> which.
>
> The results should be very apparent, and only forgotten option may
> obfuscate it.
>
> []
>
> --
> Best regards,
>  Paul  mailto:pmis...@gmail.com
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZWKHUVQUMTUIGKXHGXG2AA3F35VUD2Y4/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

[Python-ideas] Re: Compound with .. else statement

2020-03-30 Thread Jimmy Thrasibule
>
> Perhaps you could use try/finally:
>
> try:
> prepare()
> do_something_sensitive()
> finally:
> cleanup()
>

Well I actually would like to run the else block in case an exception did
occurred.

Let me provide an example from my use case which is the management of a
database transaction:


with savepoint(transaction_manager):
# Let's try to add into the database with some constraints.
obj = db.add(data)
db.flush()
else:
# Object already in database.
obj = db.get(data)


With the following context manager:


class savepoint(object):
def __init__(self):
self._sp = None

def __enter__(self, tm):
self._sp = tm.savepoint()

def __exit__(self, exc_ty, exc_val, tb):
if exc_ty is not None and issubclass(ecx_ty, IntegrityError):
self._sp.rollback()
# We have an exception, execute else block.
return False

# All good, we commit our transaction.
self._sp.commit()
return True


I find it quite a pretty, try and fail back way that I can easily replicate
in my code without having to prepare and clean up each time with a
try/catch.

>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/V5FHTPRRHQBKMLFPCRLBHTRQGEB6WTMB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Compound with .. else statement

2020-03-30 Thread Christopher Barker
On Mon, Mar 30, 2020 at 3:19 PM Serhiy Storchaka 
wrote:

> 31.03.20 00:27, Jimmy Thrasibule пише:
> > In my situation, I would like to mix the `with` statement with `else`.
> > In this case I would like that if no exception is raised within the
> > `with` to run the `else` part.
>
> It is easy. You do not need "else".
>
> with my_context():
>   do_something_sensitive()
> print("We're all safe.")


In case Serhiy's answer wasn't clear: context managers can be written to
handle exceptions (within their context) in any way you see fit.

that is: the method:

__exit__(self, exc_type, exc_value, exc_traceback):

get the exception, and information about it, of one is raised, so you can
handle it anyway you want.

-CHB



> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/M7HVLCGQ6REMHKGZVZPIYTJGLN6WKN5F/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7SXSGIENYXWKWTJFC7LELWT3XVH3EUOV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Compound with .. else statement

2020-03-30 Thread Serhiy Storchaka

31.03.20 00:27, Jimmy Thrasibule пише:

In my situation, I would like to mix the `with` statement with `else`.
In this case I would like that if no exception is raised within the
`with` to run the `else` part.


It is easy. You do not need "else".

with my_context():
 do_something_sensitive()
print("We're all safe.")
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/M7HVLCGQ6REMHKGZVZPIYTJGLN6WKN5F/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Compound with .. else statement

2020-03-30 Thread Dan Sommers
On Mon, 30 Mar 2020 23:27:19 +0200
Jimmy Thrasibule  wrote:

> Now imagine that in my `try .. except` block I have some heavy setup
> to do before `do_something_sensitive()` and some heavy cleanup when
> the exception occurs.

> I'd like my context manager to do the preparation work, execute the
> body, and cleanup. Or execute my else block only if there is no
> exception.

> Is there already a way to accomplish this in Python or can this be a
> nice to have?

Perhaps you could use try/finally:

try:
prepare()
do_something_sensitive()
finally:
cleanup()

Whether the call to prepare goes inside or outside the try block depends
on many things, mostly its coupling to the cleanup procedure (e.g., do
they need to share objects? is cleanup idempotennt?).

HTH,
Dan

-- 
“Atoms are not things.” – Werner Heisenberg
Dan Sommers, http://www.tombstonezero.net/dan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/DGUP3WAOVLVC74TIOCVTMKMZKORTTON3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Compound with .. else statement

2020-03-30 Thread Jimmy Thrasibule
Hi,

In Python, there are multiple [compound
statements](https://docs.python.org/3/reference/compound_stmts.html)
with the `else` keyword.

For example:

```
for x in iterable:
if x == sentinel:
break
else:
print("Sentinel not found.")
```

or:

```
try:
do_something_sensitive()
except MyError:
print("Oops!")
else:
print("We're all safe.")
```

In my situation, I would like to mix the `with` statement with `else`.
In this case I would like that if no exception is raised within the
`with` to run the `else` part.

For example:

```
with my_context():
do_something_sensitive()
else:
print("We're all safe.")
```

Now imagine that in my `try .. except` block I have some heavy setup
to do before `do_something_sensitive()` and some heavy cleanup when
the exception occurs.

I'd like my context manager to do the preparation work, execute the
body, and cleanup. Or execute my else block only if there is no
exception.

Is there already a way to accomplish this in Python or can this be a
nice to have?

Regards,
Jimmy
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EAMYECTRMYSXYYYCKA3BPPFUKA3IPUW5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Paul Sokolovsky
Hello,

On Tue, 31 Mar 2020 07:40:01 +1100
Chris Angelico  wrote:

> On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky 
> wrote:
> > for i in range(5):
> > v = u"==%d==" % i
> > # All individual strings will be kept in the list and
> > # can't be GCed before teh final join.
> > sz += sys.getsizeof(v)
> > sb.append(v)
> > s = "".join(sb)
> > sz += sys.getsizeof(sb)
> > sz += sys.getsizeof(s)
> > print(sz)
> >  
> 
> > ... about order of magnitude more memory ...  
> 
> I suspect you may be multiply-counting some of your usage here. Rather
> than this, it would be more reliable to use the resident set size (on
> platforms where you can query that).

I may humbly suggest a different process too: get any hardware
board with MicroPython and see how much data you can collect in a
StringIO and in a list of strings. Well, you actually don't need a
dedicated hardware, just get a Linux or Windows version and run it
with a specific heap size using a -X heapsize= switch, e.g. -X
heapsize=100K.

Please don't stop there, we talk multiple implementations, try it on
CPython too. There must be a similar option there (because how
otherwise you can perform any memory-related testing!), I just forgot
which.

The results should be very apparent, and only forgotten option may
obfuscate it.

[]

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZWKHUVQUMTUIGKXHGXG2AA3F35VUD2Y4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Andrew Barnert via Python-ideas
On Mar 30, 2020, at 13:06, Paul Sokolovsky  wrote:
> 
> I appreciate expressing it all concisely and clearly. Then let me
> respond here instead of the very first '"".join() rules!' reply I got.

Ignoring replies doesn’t actually answer them.

> The issue with "".join() is very obvious:
> 
> --
> import io
> import sys
> 
> 
> def strio():
>sb = io.StringIO()
>for i in range(5):
>sb.write(u"==%d==" % i)
>print(sys.getsizeof(sb) + sys.getsizeof(sb.getvalue()))

This doesn’t tell you anything useful. As the help for getsizeof makes clear, 
“Only the memory consumption directly attributed to the object is accounted 
for, not the memory consumption of objects it refers to”. So this gives you 
some fixed value like 152, no matter how big the buffer and other internal 
objects may be.

If you’re using CPython with the C accelerator, none of those things are 
available to you from the API, but a quick scan of the C source shows what’s 
there, and it’s generally actually more storage than the list version. 
Oversimplifying a bit: While you’re building, it keeps a _PyAccu structure, 
which is basically a wrapper around that same list of strings. When you call 
getvalue() it then builds a Py_UCS4* representation that’s in this case 4x the 
size of the final string (since your string is pure ASCII and will be stored in 
UCS1, not UCS4). And then there’s the final string.

So, if this memory issue makes join unacceptable, it makes your optimization 
even more unacceptable.

And thinking about portable code makes it even worse. Your code might be run 
under CPython and take even more memory, or it might be run under a different 
Python implementation where StringIO is not accelerated (where it’s just a 
TextIOWrapper around a BytesIO) and therefore be a whole lot slower instead. So 
it has to be able to deal with both of those possibilities, not just one; code 
that uses the usual idiom, on the other hand, behaves pretty similarly on all 
implementations.

> There's absolutely no need why performing trivial operation of
> accumulating string content should take about order of magnitude more
> memory than actually needed for that string content. Don't get me wrong
> - if you want to spend that much of your memory, then sure, you can. But
> jumping with that as *the only right solution* whenever somebody
> mentions "string concatenation" is a bit ... umm, cavalier

And making a wild guess about how things might be implemented and offering an 
optimization based on that guess that actually makes things worse and refusing 
to even reply when people point out the problems isn’t even more cavalier?

> My whole concern is along 2 lines:
> 
> 1. This StringBuilder class *could* be an existing io.StringIO.
> 2. By just adding __iadd__ operator to it.

No, it really couldn’t. The semantics are wrong (unless you want, say, 
universal newline handling in your string builder?), it’s optimized for a 
different use case than string building, and both the pure-Python and CPython 
accelerator implementations are less efficient in speed and/or memory.

> That's it, nothing else. What's inside StringIO class is up to you (dear
> various Python implementations, their maintainers, and contributors).

Sure, but what’s inside has to actually perform the job it was designed to do 
and is documented to do: to simulate a file object in memory. Which is not the 
same thing as being a string builder.

> For example, fans of "".join() surely can have it inside. Actually,
> it's a known fact that Python2's "StringIO" module (the original home
> of StringIO class) was implemented exactly like that, so you can go
> straight back to the future.

Python2’s StringIO module is for bytes, not Unicode strings. If you want a 
mutable bytes-like type, bytearray already exists; there’s no need to wrap the 
sequence up in a file-like API just to rewrap that in a sequence-like API 
again; just use the sequence directly. What StringIO is there for is when you 
_need_ the file API, just as in Python 3’s io.BytesIO. It’s not a more 
efficient bytearray or one better suited for string building; it’s less 
efficient and less well suited for string building but it adds different 
features.

> And again, the need for anything like that might be unclear for
> CPython-only users. Such users can write a StringBuilder class like
> above, or repeat the beautiful "".join() trick over and over again. The
> need for a nice string builder class may occur only from the
> consideration of the Python-as-a-language lacking a clear and nice
> abstraction for it, and from thinking how to add such an abstraction in
> a performant way (of which criteria are different) in as many
> implementation as possible, in as easy as possible way. (At least
> that's my path to it, I'm not sure if a different thought process might
> lead to it too.)

The problem isn’t your start, it’s jumping to the assumption that StringIO must 
be an answer, and then not checking the 

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Chris Angelico
On Tue, Mar 31, 2020 at 7:04 AM Paul Sokolovsky  wrote:
> for i in range(5):
> v = u"==%d==" % i
> # All individual strings will be kept in the list and
> # can't be GCed before teh final join.
> sz += sys.getsizeof(v)
> sb.append(v)
> s = "".join(sb)
> sz += sys.getsizeof(sb)
> sz += sys.getsizeof(s)
> print(sz)
>

> ... about order of magnitude more memory ...

I suspect you may be multiply-counting some of your usage here. Rather
than this, it would be more reliable to use the resident set size (on
platforms where you can query that).

if "strio" in sys.argv: strio()
else: listjoin()
print("Max RSS:", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

Based on that, I find that it's at worst a 4:1 difference. Plus, I
couldn't see any material difference - the numbers were within half a
percent, basically just noise - until I upped your loop counter to
400,000, nearly ten times as much as you were doing. (At that point it
became a 2:1 difference. The 4:1 didn't show up until a lot later.) So
you have to be working with a *ridiculous* number of strings before
there's anything to even consider.

And even then, it's only notable if the individual strings are short
AND all unique. Increasing the length of the strings basically made it
a wash. Consider:

for i in range(100):
sb.write(u"==%d==" % i + "*"*1024)

Max RSS: 2028060

for i in range(100):
v = u"==%d==" % i + "*"*1024

Max RSS: 2104204

So at this point, the string join is slightly faster and takes
slightly more memory - within 20% on the time and within 5% on the
memory.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/253V2BEV5UTMBKGRFOWM4Z4OOTJIALN7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Paul Sokolovsky
Hello,

On Mon, 30 Mar 2020 12:37:48 -0700
Andrew Barnert  wrote:

> On Mar 30, 2020, at 12:00, Paul Sokolovsky  wrote:
> > Roughly speaking, to support efficient appending, one need to
> > be ready to over-allocate string storage, and maintain bookkeeping
> > for this. Another known optimization CPython does is for stuff like
> > "s = s[off:]", which requires maintaining another "offset" pointer.
> > Even with this simplistic consideration, internal structure of
> > "str" would be about the same as "io.StringIO" (which also needs to
> > over-allocate and maintain "current offset" pointer). But why, if
> > there's io.StringIO in the first place?  
> 
> Because io.StringIO does _not_ need to do that. It’s documented to
> act like a TextIOWrapper around a BytesIO.

You miss the point of my RFC - it says it *can* do that, for free. And
it *can* be documented as a class to perform very reasonable string
construction across various Python implementations. And any Python
implementation providing StringIO can pick it up very easily.

I hear you, you say "no need". Noted, thanks for detailed feedback.
(It's p.4.1 in the RFC, "there's no problem with CPython3, so there's
nothing to fix").

[]

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LM7VV66FGB3NV3KMK7OENWHABVWMULHV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Paul Sokolovsky
Hello,

On Tue, 31 Mar 2020 04:27:04 +1100
Chris Angelico  wrote:

[]

> There's a vast difference between "mutable string" and "string
> builder". The OP was talking about this kind of thing:
> 
> buf = ""
> for i in range(5):
> buf += "foo"
> print(buf)
> 
> And then suggested using a StringIO for that purpose. But if you're
> going to change your API, just use a list:
> 
> buf = []
> for i in range(5):
> buf.append("foo")
> buf = "".join(buf)
> print(buf)

I appreciate expressing it all concisely and clearly. Then let me
respond here instead of the very first '"".join() rules!' reply I got.
The issue with "".join() is very obvious:

--
import io
import sys


def strio():
sb = io.StringIO()
for i in range(5):
sb.write(u"==%d==" % i)
print(sys.getsizeof(sb) + sys.getsizeof(sb.getvalue()))

def listjoin():
sb = []
sz = 0
for i in range(5):
v = u"==%d==" % i
# All individual strings will be kept in the list and
# can't be GCed before teh final join.
sz += sys.getsizeof(v)
sb.append(v)
s = "".join(sb)
sz += sys.getsizeof(sb)
sz += sys.getsizeof(s)
print(sz)

strio()
listjoin()
--

$ python3.6 memuse.py 
439083
3734325


So, it's obvious, but let's formulate it clearly for avoidance of
doubt:

There's absolutely no need why performing trivial operation of
accumulating string content should take about order of magnitude more
memory than actually needed for that string content. Don't get me wrong
- if you want to spend that much of your memory, then sure, you can. But
jumping with that as *the only right solution* whenever somebody
mentions "string concatenation" is a bit ... umm, cavalier.

> This is going to outperform anything based on StringIO fairly easily,

Since when raw speed is the only criterion for performance? If you say
"forever", I'll trust only if you proceed with showing assembly code
with SSE and AVX which you wrote to get those last cycles out.

Otherwise, being able to complete operations in reasonable amount of
memory, not OOM and not being DoSed by trivial means, and finally,
serving 8 times more requests in the same amount of memory - are alll
quite criteria too. 

What's interesting, that so far, the discussion almost 1-to-1 parallels
discussion in the 2006 thread I linked from the original mail.

> So if you really want a drop-in replacement, don't build it around
> StringIO, build it around list.
> 
> class StringBuilder:
> def __init__(self): self.data = []
> def __iadd__(self, s): self.data.append(s)
> def __str__(self): return "".join(self.data)

But of course! And what's most important, nowhere did I talk what
should be inside this class. My whole concern is along 2 lines:

1. This StringBuilder class *could* be an existing io.StringIO.
2. By just adding __iadd__ operator to it.

That's it, nothing else. What's inside StringIO class is up to you (dear
various Python implementations, their maintainers, and contributors).
For example, fans of "".join() surely can have it inside. Actually,
it's a known fact that Python2's "StringIO" module (the original home
of StringIO class) was implemented exactly like that, so you can go
straight back to the future.


And again, the need for anything like that might be unclear for
CPython-only users. Such users can write a StringBuilder class like
above, or repeat the beautiful "".join() trick over and over again. The
need for a nice string builder class may occur only from the
consideration of the Python-as-a-language lacking a clear and nice
abstraction for it, and from thinking how to add such an abstraction in
a performant way (of which criteria are different) in as many
implementation as possible, in as easy as possible way. (At least
that's my path to it, I'm not sure if a different thought process might
lead to it too.)

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KULQBPYYHF6LG46E2LJB2IW5EUFKFAKB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Andrew Barnert via Python-ideas
On Mar 30, 2020, at 12:00, Paul Sokolovsky  wrote:
> Roughly speaking, to support efficient appending, one need to
> be ready to over-allocate string storage, and maintain bookkeeping for
> this. Another known optimization CPython does is for stuff like "s =
> s[off:]", which requires maintaining another "offset" pointer. Even
> with this simplistic consideration, internal structure of "str" would
> be about the same as "io.StringIO" (which also needs to over-allocate
> and maintain "current offset" pointer). But why, if there's io.StringIO
> in the first place?

Because io.StringIO does _not_ need to do that. It’s documented to act like a 
TextIOWrapper around a BytesIO. And the pure-Python implementation (as used by 
some non-CPython implementations of Python) is actually implemented that way: 
https://github.com/python/cpython/blob/3.8/Lib/_pyio.py#L2637. Every read and 
write to a StringIO passes through the incremental newline processor and the 
incremental UTF-8 coded to get passed on to a BytesIO. That’s not remotely 
optimal. And it doesn’t allow you to do random-access seeks to arbitrary 
character positions.

It’s true that the C accelerator for io.StringIO used by CPython uses a dynamic 
overallocated array of UCS4 instead, but you can’t rely on that portably any 
more than you can rely on CPython’s str.__iadd__
optimization portably. Plus, it’s optimized for typical file-like usage, not 
for typical string-like usage, so the resize rules aren’t the same; there’s no 
attempt to optimize storage for all-Latin or all-BMP text; and so on. Plus, it 
still has to deal with file-ish things like universal newline support which you 
not only don’t need, but explicitly want to not be there.

> (*) Instead, there're various of practical hacks to implement it, as
> both 2006's and this thread shows.

No, there is one idiomatic way to do it: create a list of strings and join 
them. That’s not a “hack” any more than using a string builder class or a 
string stream/file class is a “hack”. The fact that the standard Python idiom, 
the standard Java idiom, and the standard C++ idiom for building strings are 
all different is not a defect in any of those three languages; they’re all 
perfectly reasonable. And changing Python to have two standard idioms instead 
of one (with the new one less efficient and more complicated) would not be an 
improvement.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RV4QGKKU4OQVP4RVHFIYP5OQCDV2OTYO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Andrew Barnert via Python-ideas
On Mar 30, 2020, at 10:18, Joao S. O. Bueno  wrote:
> 
> That said, anyone could tell about small, efficient, 
> well maintained "mutable string" classes on Pypi?

I don’t know of one. But what do you actually want it for? In most cases where 
you want “mutable strings”, what you really want is either a string builder 
(just wrap up a list of strings and join), or something that (unlike a list, 
array.array, etc.) provides insert and delete of substrings in better than 
linear time, like a gap buffer or rope or tree-indexed thing or similar (and 
there are good PyPI libraries for some of those things). But if you actually 
have a use for a simple mutable string that had the str API plus the 
MutableSequence API and performs roughly like array.array('Q') but with 
substrings instead of their codepoint int values, I don’t think anyone’s built 
that.

If you want to build it yourself, I doubt it’s possible to make a pure-Python 
version that’s efficient enough for real use in CPython; you’d probably need a 
C accelerator just to avoid the cost of boxing and unboxing between ints and 
single-char strings for most operations. However, you probably could build a 
minimal “ucs4array” class with a C accelerator and then build most of the str 
API on top of that in pure Python. (Or, if you want the space efficiency of 
CPython strings, you need ucs1/ucs2/ucs4array types and a str that switches 
between them.)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Z7MCYC3P3A6T2X67TO6XZA7LINIXDS7W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Paul Sokolovsky
Hello,

On Mon, 30 Mar 2020 09:58:32 -0700
Brett Cannon  wrote:

> On Sun, Mar 29, 2020 at 10:58 AM Paul Sokolovsky 
> wrote:
> 
> > [SNIP]
> >
> > 1. Succumb to applying the same mis-optimization for string type as
> > CPython3. (With the understanding that for speed-optimized projects,
> > implementing mis-optimizations will eat into performance budget, and
> > for memory-optimized projects, it likely will lead to noticeable
> > memory bloat.)
> > [SNIP]
> >
> > 1. The biggest "criticism" I see is a response a-la "there's no
> > problem with CPython3, so there's nothing to fix". This is related
> > to a bigger questions "whether a life outside CPython exists", or
> > put more formally, where's the border between Python-the-language
> > and CPython-the-implementation. To address this point, I tried to
> > collect performance stats for a pretty wide array of Python
> > implementations. 
> 
> I don't think characterizing this as a "mis-optimization" is fair.
> There is use of in-place add with strings in the wild and CPython
> happens to be able to optimize for it.

Everyone definitely doesn't have to agree with that characterization.
Nor there's strong need to be offended that it's "unfair". After all,
it's just somebody's opinion. Roughly speaking, the need to be upset by
the "mis-" prefix is about the same as need to be upset by "bad" in
some random blog post, e.g. https://snarky.ca/my-impressions-of-elm/

I'm also sure that people familiar with implementation details would
understand why that "mis-" prefix, but let me be explicit otherwise: a
string is one of the fundamental types in many languages, including
Python. And trying to make it too many things at once has its
overheads. Roughly speaking, to support efficient appending, one need to
be ready to over-allocate string storage, and maintain bookkeeping for
this. Another known optimization CPython does is for stuff like "s =
s[off:]", which requires maintaining another "offset" pointer. Even
with this simplistic consideration, internal structure of "str" would
be about the same as "io.StringIO" (which also needs to over-allocate
and maintain "current offset" pointer). But why, if there's io.StringIO
in the first place?

> Someone was motivated to do
> the optimization so we took it without hurting performance for other
> things. There are plenty of other things that I see people regularly
> that I don't personally think is best practices but that doesn't mean
> we should automatically ignore them and not help make their code more
> performant if possible without sacrificing best practice performance.

Nowhere did I argue against applying that optimization in CPython.
Surely, in general, the more optimizations, the better. I just stated
the fact that of 8 (well, 11, 11!) Python'ish implementations surveyed,
only 1 implemented it.

And what went implied, is that even under ideal conditions that other
implementations say "we have resources to implement and maintain that
optimization" (we still talking about "str +=" optimization), then at
least for some projects, it would be against their interests. E.g.
MicroPython, Pycopy, Snek optimize for memory usage, TinyPy for
simplicity of implementation. "Too-complex basic types" are also a
known problem for JITs (which become less performant due to need to
handle multiple cases of the same primitive type and much harder to
develop and debug).

At the same time, ergonomics of "str +=" is very good (heck, that's why
people use it). So, I was looking for the simplest possible
change which would allow for the largest part of that ergonomics in an
object type more suitable for content accumulation *across* different
Python'ish implementations.

I have to admit that I was inspired to write down this RFC by PEP 616
"String methods to remove prefixes and suffixes". Who'd think that
after so many years, there's still something useful to be added to
sting methods (and then, that it doesn't have to be as complex as one
can devise at full throttle, but much simpler than that).

> And I'm not sure if you're trying to insinuate that CPython represents
> Python the language 

That's an old and painful (to some) topic.

> and thus needs to not optimize for something other
> implementations have/can not optimize for, which if you are

As I clarified, I don't say that CPython shouldn't optimize for things.
I just tried to argue that there's no clearly defined abstraction (*)
for accumulating string buffer, and argued that it could be easily
"established".

(*) Instead, there're various of practical hacks to implement it, as
both 2006's and this thread shows.

> suggesting that then I have an uncomfortable conversation I need to
> have with PyPy .
> Or if you're saying CPython and Python should be
> considered separate, then why can't CPython optimize for something it
> happens to be positioned to optimize for that other implementations
> can't/haven't?

Yes, I personally think that CPython and Python should be
considered 

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Chris Angelico
On Tue, Mar 31, 2020 at 5:10 AM Serhiy Storchaka  wrote:
>
> 30.03.20 20:27, Chris Angelico пише:
> >  def __iadd__(self, s): self.data.append(s)
>
> __iadd__ should return self.
>

And that's what I get for quickly whipping something up and not
testing it. Good catch. But you get the idea - a simple wrapper around
a *list* is going to be way better than a wrapper around StringIO.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/TQIP5EKI3ZZ66LQNQW4YWQ53D6PO6YZN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Serhiy Storchaka

30.03.20 20:27, Chris Angelico пише:

 def __iadd__(self, s): self.data.append(s)


__iadd__ should return self.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FWP5SD42OLX55XJ7HG5JX5D3FBAQ44ZP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Serhiy Storchaka

30.03.20 20:07, Andrew Barnert via Python-ideas пише:
Sadly, this isn’t possible. Large amounts of C code—including builtins and stdlib—won’t let you duck type as a string; as it will do a type check and expect an actual str (and if you subclass str, it will ignore your methods and use the PyUnicode APIs to get your base class’s storage directly as a buffer instead). So, no type, either C or Python, can really be a drop-in replacement for str. At best you can have something that you have to call str() on half the time. 


I agree with this. It is not possible with the current PyUnicode 
implementation and the current C API. And even if we can make it 
possible for most cases, it will significantly complicate the code and 
the benefit will likely be not worth the cost.



That’s why there’s no MutableStr on PyPI, and no UTF8Str, no EncodedStr that 
can act as both a bytes and a str by remembering its encoding (Nick Coghlan’s 
motivating example for changing this back in the early 3.x days), etc.


It is not so hard to implement EncodedStr (but it will look not like you 
expect). I was going to add it and did some preparations which make it 
possible. You have just to add the __bytes__ method to string subclass 
to make bytes(encoded_str) working (it might be enough for my purposes). 
Or add support of the buffer protocol if you want larger compatibility 
with bytes, but you can not do this in pure Python. I abandoned this 
idea because the need (compatibility with some Python 2 pickles) was not 
large.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GFR2BFW6BO2FITK6G7JY2VQZJW3JN33W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Chris Angelico
On Tue, Mar 31, 2020 at 4:20 AM Joao S. O. Bueno  wrote:
>
> Hi Andrew  -
>
> I made my previous post before reading your first answer.
>
> So, anyway, what we have is that for a "mutable string like object" one
> is free to build his wrapper - StringIO based or not - put it on pypi, and 
> remember
> calling `str()` on it before having it leave your code.
>
> Thank you for the lengthy reply anyway.
>
> That said, anyone could tell about small, efficient,
> well maintained "mutable string" classes on Pypi?
>

There's a vast difference between "mutable string" and "string
builder". The OP was talking about this kind of thing:

buf = ""
for i in range(5):
buf += "foo"
print(buf)

And then suggested using a StringIO for that purpose. But if you're
going to change your API, just use a list:

buf = []
for i in range(5):
buf.append("foo")
buf = "".join(buf)
print(buf)

So if you really want a drop-in replacement, don't build it around
StringIO, build it around list.

class StringBuilder:
def __init__(self): self.data = []
def __iadd__(self, s): self.data.append(s)
def __str__(self): return "".join(self.data)

This is going to outperform anything based on StringIO fairly easily,
plus it's way WAY simpler.

But this is *not* a mutable string. It's a string builder. If you want
a mutable string, first figure out exactly what mutations you need,
and what performance you are willing to accept.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/X5OQOV3FKEXVVXRHHWTMNQY5OLYSOFKA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Andrew Barnert via Python-ideas
On Mar 30, 2020, at 10:01, Brett Cannon  wrote:
> 
> I don't think characterizing this as a "mis-optimization" is fair. There is 
> use of in-place add with strings in the wild and CPython happens to be able 
> to optimize for it. Someone was motivated to do the optimization so we took 
> it without hurting performance for other things. There are plenty of other 
> things that I see people regularly that I don't personally think is best 
> practices but that doesn't mean we should automatically ignore them and not 
> help make their code more performant if possible without sacrificing best 
> practice performance.

Yes. A big part of the reason there’s so much use in the wild is that for small 
cases that aren’t in the middle of a bottleneck, it’s perfectly reasonable for 
people to add two or three strings and not care about performance. (Who cares 
about N**2 when N<=15 and it happens at most 4 times per run of your program?) 
So people do it, and it’s fine. When they really do need to optimize, a quick 
search of the FAQ or StackOverflow or whatever will tell them the right way to 
do it, and they do it, but most of the time it doesn’t matter.

So when CPython at some point optimized str concatenation and made a bunch of 
scripts 1% faster, most people didn’t notice, and of course they wouldn’t have 
complained if they had.

Maybe the OP could argue that this was a bad decision by finding examples of 
code that actually relies on that optimization despite being intended to be 
portable to other implementations. It’s worth comparing the case of calling sum 
on strings—which is potentially abused more often than used harmlessly, so 
instead of optimizing it, CPython made it an error. But without any such known 
examples, it’s hard not to call the string concatenation optimization a win.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZTTB73LXPPGB7QD7IQJ65PV2VWV5FTNV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Joao S. O. Bueno
Hi Andrew  -

I made my previous post before reading your first answer.

So, anyway, what we have is that for a "mutable string like object" one
is free to build his wrapper - StringIO based or not - put it on pypi, and
remember
calling `str()` on it before having it leave your code.

Thank you for the lengthy reply anyway.

That said, anyone could tell about small, efficient,
well maintained "mutable string" classes on Pypi?

On Mon, 30 Mar 2020 at 14:07, Andrew Barnert  wrote:

> On Mar 30, 2020, at 08:29, Joao S. O. Bueno  wrote:
> >
> > 
> > I agree with the arguments the OP brings forward.
> >
> > Maybe, it should be the case of having an `StringIO` and `BytesIO`
> subclass?
> > Or better yet, just a class that wraps those, and hide away the other
> file-like
> > methods and behaviors?
>
> Why? What’s the benefit of building a mutable string around a virtual file
> object wrapped around a buffer (with all the extra complexities and
> performance costs that involves, like incremental Unicode encoding and
> decoding) instead of just building it around a buffer directly?
>
> Also, how can you implement an efficient randomly-accessible mutable
> string object on top of a text file object? Text files don’t do
> constant-time random-access seek to character positions; they can only seek
> to the opaque tokens returned by tell. (This should be obvious if you think
> about how you could seek to the 137th character in a UTF-8 file without
> reading all of the first 137 characters.) (In fact, recent versions of
> CPython optimize StringIO so it only fakes being a TextIOWrapper around a
> BytesIO and actually uses a Py_UCS4* buffer for storage, but that’s
> CPython-specific, not guaranteed, and not accessible from Python even in
> CPython.)
>
> And, even if that were a good idea for implementation reasons, why should
> the user care? If they need a mutable string, why do they care whether you
> give them one that inherits from or delegates to a StringIO instead of a
> list or an array.array of int32 or the CPython string buffer API (whether
> accessed via a C extension or ctypes.pythonapi) or a pure C library with
> its own implementation and optimizations?
>
> More generally, a StringIO is neither the obvious way nor the fastest way
> nor the recommended way to build strings on the fly in Python, so why do
> you agree with the OP that we need to make it better for that purpose? Just
> to benefit people who want to write C++ instead of Python? If the goal is
> to cater to people who won’t read the docs to learn the right way, the
> obvious solution is to mandate the non-quadratic string concatenation of
> CPython for all implementations, not to give them yet another way of doing
> it and hope they’ll guess or look up that one even though they didn’t guess
> or look up the long-standing existing one.
>
> > That would keep the new class semantically as a string,
> > and they could implement all of the str/bytes methods and attributes
> > so as to be a drop-in replacement
>
> Sadly, this isn’t possible. Large amounts of C code—including builtins and
> stdlib—won’t let you duck type as a string; as it will do a type check and
> expect an actual str (and if you subclass str, it will ignore your methods
> and use the PyUnicode APIs to get your base class’s storage directly as a
> buffer instead). So, no type, either C or Python, can really be a drop-in
> replacement for str. At best you can have something that you have to call
> str() on half the time. That’s why there’s no MutableStr on PyPI, and no
> UTF8Str, no EncodedStr that can act as both a bytes and a str by
> remembering its encoding (Nick Coghlan’s motivating example for changing
> this back in the early 3.x days), etc.
>
> Fixing this cleanly would probably require splitting the string C API into
> abstract and concrete versions a la sequence and then changing a ton of
> code to respect abstract strings (to only optimize for concrete ones rather
> than requiring them, again like sequences). Fixing it slightly less cleanly
> with a hookable API might be more feasible (I’m pretty sure Nick Coghlan
> looked into it before the 3.3 string redesign; I don’t know if anyone has
> since), but it’s still probably a major change.
>
>
>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5IQR5PAHUFVEZU3T3NWZ6LWLECBFP42D/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Andrew Barnert via Python-ideas
On Mar 30, 2020, at 08:29, Joao S. O. Bueno  wrote:
> 
> 
> I agree with the arguments the OP brings forward.
> 
> Maybe, it should be the case of having an `StringIO` and `BytesIO` subclass?
> Or better yet, just a class that wraps those, and hide away the other 
> file-like
> methods and behaviors? 

Why? What’s the benefit of building a mutable string around a virtual file 
object wrapped around a buffer (with all the extra complexities and performance 
costs that involves, like incremental Unicode encoding and decoding) instead of 
just building it around a buffer directly?

Also, how can you implement an efficient randomly-accessible mutable string 
object on top of a text file object? Text files don’t do constant-time 
random-access seek to character positions; they can only seek to the opaque 
tokens returned by tell. (This should be obvious if you think about how you 
could seek to the 137th character in a UTF-8 file without reading all of the 
first 137 characters.) (In fact, recent versions of CPython optimize StringIO 
so it only fakes being a TextIOWrapper around a BytesIO and actually uses a 
Py_UCS4* buffer for storage, but that’s CPython-specific, not guaranteed, and 
not accessible from Python even in CPython.)

And, even if that were a good idea for implementation reasons, why should the 
user care? If they need a mutable string, why do they care whether you give 
them one that inherits from or delegates to a StringIO instead of a list or an 
array.array of int32 or the CPython string buffer API (whether accessed via a C 
extension or ctypes.pythonapi) or a pure C library with its own implementation 
and optimizations?

More generally, a StringIO is neither the obvious way nor the fastest way nor 
the recommended way to build strings on the fly in Python, so why do you agree 
with the OP that we need to make it better for that purpose? Just to benefit 
people who want to write C++ instead of Python? If the goal is to cater to 
people who won’t read the docs to learn the right way, the obvious solution is 
to mandate the non-quadratic string concatenation of CPython for all 
implementations, not to give them yet another way of doing it and hope they’ll 
guess or look up that one even though they didn’t guess or look up the 
long-standing existing one.

> That would keep the new class semantically as a string,
> and they could implement all of the str/bytes methods and attributes 
> so as to be a drop-in replacement 

Sadly, this isn’t possible. Large amounts of C code—including builtins and 
stdlib—won’t let you duck type as a string; as it will do a type check and 
expect an actual str (and if you subclass str, it will ignore your methods and 
use the PyUnicode APIs to get your base class’s storage directly as a buffer 
instead). So, no type, either C or Python, can really be a drop-in replacement 
for str. At best you can have something that you have to call str() on half the 
time. That’s why there’s no MutableStr on PyPI, and no UTF8Str, no EncodedStr 
that can act as both a bytes and a str by remembering its encoding (Nick 
Coghlan’s motivating example for changing this back in the early 3.x days), etc.

Fixing this cleanly would probably require splitting the string C API into 
abstract and concrete versions a la sequence and then changing a ton of code to 
respect abstract strings (to only optimize for concrete ones rather than 
requiring them, again like sequences). Fixing it slightly less cleanly with a 
hookable API might be more feasible (I’m pretty sure Nick Coghlan looked into 
it before the 3.3 string redesign; I don’t know if anyone has since), but it’s 
still probably a major change.


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/3EPSLFWDAOHKBXST6HYZIXPJHPNNMB6R/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Brett Cannon
On Sun, Mar 29, 2020 at 10:58 AM Paul Sokolovsky  wrote:

> [SNIP]
>
> 1. Succumb to applying the same mis-optimization for string type as
> CPython3. (With the understanding that for speed-optimized projects,
> implementing mis-optimizations will eat into performance budget, and
> for memory-optimized projects, it likely will lead to noticeable
> memory bloat.)
> [SNIP]
>
> 1. The biggest "criticism" I see is a response a-la "there's no problem
> with CPython3, so there's nothing to fix". This is related to a bigger
> questions "whether a life outside CPython exists", or put more
> formally, where's the border between Python-the-language and
> CPython-the-implementation. To address this point, I tried to collect
> performance stats for a pretty wide array of Python implementations.
>

I don't think characterizing this as a "mis-optimization" is fair. There is
use of in-place add with strings in the wild and CPython happens to be able
to optimize for it. Someone was motivated to do the optimization so we took
it without hurting performance for other things. There are plenty of other
things that I see people regularly that I don't personally think is best
practices but that doesn't mean we should automatically ignore them and not
help make their code more performant if possible without sacrificing best
practice performance.

And I'm not sure if you're trying to insinuate that CPython represents
Python the language and thus needs to not optimize for something other
implementations have/can not optimize for, which if you are suggesting that
then I have an uncomfortable conversation I need to have with PyPy . Or
if you're saying CPython and Python should be considered separate, then why
can't CPython optimize for something it happens to be positioned to
optimize for that other implementations can't/haven't?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QOGV74QJZE3DP26NFAHNNIRWZRSHIIPJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Limit 'import as' syntax

2020-03-30 Thread joperez
There should be one-- and preferably only one --obvious way to do it. (from The 
Zen of Python https://www.python.org/dev/peps/pep-0020/)
However, in something as basic as import syntax, that's not the case. This 
example comes from PEP 221 (https://www.python.org/dev/peps/pep-0221/) :
A slightly special case exists for importing sub-modules. The statement
`import os.path`
stores the module os locally as os, so that the imported submodule path is 
accessible as os.path. As a result,
`import os.path as p`
stores os.path, not os, in p. This makes it effectively the same as
`from os import path as p`

Not only it doesn't respect the Zen of Python, but it's also quite 
counterintuitive because as explained in the PEP, the behavior of `import 
os.path as p` is not the same than `import os.path`, while `from os import path 
as p` is quite consistent with or without `as`.
There is one case where `import ... as ...` is consistent (and justified IMHO), 
that's for statements like `import _thread as thread`, only the imported object 
is aliased (as `from ... import ... as ...` do).

Looking at the standard library, only few dozens of lines match the regex 
`^import \w+\.(\w|\.)+ as`, while the other (equivalent) form has hundreds of 
matches.

That's why I propose to restrict the aliased import statement (`import ... as 
...`) to not be able to alias imported submodule, letting `from ... import ... 
as ...` statement be the only to do it.
The roadmap could be to depreciate the statement with a warning in a few next 
releases, to remove finally remove the syntax.
(hoping my English is okay)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4OI5CGD6J5TLTAVFXIXH6XCJY34P3WNY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Serhiy Storchaka

30.03.20 15:50, jope...@hotmail.fr пише:

As a result,
`import os.path as p`
stores os.path, not os, in p. This makes it effectively the same as
`from os import path as p`


No, it is not the same.

For example, `from os import mkdir as mkd` works, but `import os.mkdir 
as mkd` is an error.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/SAEYXOHZGHYG7I34TOO22FK5X4HTXDYD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread Joao S. O. Bueno
I agree with the arguments the OP brings forward.

Maybe, it should be the case of having an `StringIO` and `BytesIO` subclass?
Or better yet, just a class that wraps those, and hide away the other
file-like
methods and behaviors?

That would keep the new class semantically as a string,
and they could implement all of the str/bytes methods and attributes
so as to be a drop-in replacement  - _and_ add a proper `__setitem__` so
that
one could have a proper "mutable string". It ust would use StringIO/BytesIo
as
its "engine".


Such code would take like, 100 lines (most of them just to
forward/reimplement
some of the legacy str methods), be an effective drop-in replacement,
require no change to Python - it could even be put now in Pypi - and, maybe,
even reach Python 3.9 in time, because, as I said, I agree with your
points.





On Mon, 30 Mar 2020 at 12:06,  wrote:

> I completely agree with Andrew Barnert.
>
> I just want to add a little comment about overriding the `+=` (and `+`)
> operator for StringIO. Since StringIO is a stream --not a string--, I think
> `StringIO` should continue to use the common interface for streams in
> Python. `write()` and `read()` are fine for streams (and files) and you can
> find similar `write` and `read` functions in other languages. I cannot see
> any advantage on departing from this convention.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/LFGQVJDOGBBJ7CIYHISM4X4IZDWLGFII/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JA5CVGSH5HAUFMXVGMSU6L6JZH2FY2QI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Joseph Perez
As spotted by response, I did not mature enough my point to see that they could 
have a slight difference between both statements.
This thread is no longer relevant.
Thank you
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K7CX6HOQYT2HZQH3WYHXXEQYTJ4WTTDF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Joseph Perez
You are right, I did not envisage the case where you could have name mangling 
between submodule and variable inside package __init__.py, which could lead to 
different behavior. So my statement is erroneous and this thread is no longer 
relevant.
Thank you
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/O25CKD6WEVFWIMHJEITJ7BAUVUTG7ZCE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Joao S. O. Bueno
Maybe you should fill some feature requests to the linter projects, like
flake8, so that
they have an option to distinguish both ways so that one could point
what is the "preferred way" for a given project.

But I don't see any sense on even put a PEP 8 recommendation for this.

On Mon, 30 Mar 2020 at 12:11, Joao S. O. Bueno 
wrote:

> The part of "as X" of either "import foo.bar as X" or "from foo import bar
> as X" does _one thing_ and is fully self-consistent.
>
> The part of  "import foo.bar" and "from foo import bar"  does different
> things, that sometimes are interchangeable,
> and in some cases may have different results - however, these are well
> stablished, and overall, they don't
> even "care" or "know" if the imported part is to be renamed with an "as X"
> complement.
>
> For me that is "one obvious way to do it" and there is nothing of
> "oounerintuitive" on that.
>
> Also, trying to change or limit this now, besides blocking behaviors that
> sometimes are needed,
> would introduce severe backwards incompatibility for no gain at all.
>
>   js
>  -><-
>
> On Mon, 30 Mar 2020 at 11:54, Joseph Perez  wrote:
>
>> There is no other advantage than respect of the Zen of Python (and I
>> don't know how much it counts). Maybe it can simplify interpreter code, but
>> I don't know about it and I doubt it.
>> With that, it could help newcomers to Python to choose between the two
>> syntaxes. (And I've already experienced team conflict about syntax)
>> By the way, I think this issue is not fundamental, that's why a removal
>> would actually maybe be too strong.
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/FD5PIOEUFEQQWY475TYZPZF5SOKDQW3S/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/55AZVP524SE3PNNET74VGJQ3KQQQXQXB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Joao S. O. Bueno
The part of "as X" of either "import foo.bar as X" or "from foo import bar
as X" does _one thing_ and is fully self-consistent.

The part of  "import foo.bar" and "from foo import bar"  does different
things, that sometimes are interchangeable,
and in some cases may have different results - however, these are well
stablished, and overall, they don't
even "care" or "know" if the imported part is to be renamed with an "as X"
complement.

For me that is "one obvious way to do it" and there is nothing of
"oounerintuitive" on that.

Also, trying to change or limit this now, besides blocking behaviors that
sometimes are needed,
would introduce severe backwards incompatibility for no gain at all.

  js
 -><-

On Mon, 30 Mar 2020 at 11:54, Joseph Perez  wrote:

> There is no other advantage than respect of the Zen of Python (and I don't
> know how much it counts). Maybe it can simplify interpreter code, but I
> don't know about it and I doubt it.
> With that, it could help newcomers to Python to choose between the two
> syntaxes. (And I've already experienced team conflict about syntax)
> By the way, I think this issue is not fundamental, that's why a removal
> would actually maybe be too strong.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/FD5PIOEUFEQQWY475TYZPZF5SOKDQW3S/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/YBO44XWDMY35CQ4ESPFIUZXWE4VNPF6Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

2020-03-30 Thread jdveiga
I completely agree with Andrew Barnert.

I just want to add a little comment about overriding the `+=` (and `+`) 
operator for StringIO. Since StringIO is a stream --not a string--, I think 
`StringIO` should continue to use the common interface for streams in Python. 
`write()` and `read()` are fine for streams (and files) and you can find 
similar `write` and `read` functions in other languages. I cannot see any 
advantage on departing from this convention.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/LFGQVJDOGBBJ7CIYHISM4X4IZDWLGFII/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Joseph Perez
There is no other advantage than respect of the Zen of Python (and I don't know 
how much it counts). Maybe it can simplify interpreter code, but I don't know 
about it and I doubt it. 
With that, it could help newcomers to Python to choose between the two 
syntaxes. (And I've already experienced team conflict about syntax)
By the way, I think this issue is not fundamental, that's why a removal would 
actually maybe be too strong.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FD5PIOEUFEQQWY475TYZPZF5SOKDQW3S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread jdveiga
joperez@hotmail.fr wrote:
> There should be one-- and preferably only one --obvious way to do it. (from 
> The Zen of
> Python https://www.python.org/dev/peps/pep-0020/)
> However, in something as basic as import syntax, that's not the case. This 
> example comes
> from PEP 221 (https://www.python.org/dev/peps/pep-0221/)
> :
> A slightly special case exists for importing sub-modules. The statement
> import os.path
> stores the module os locally as os, so that the imported submodule path is 
> accessible as
> os.path. As a result,
> import os.path as p
> stores os.path, not os, in p. This makes it effectively the same as
> from os import path as p
> Not only it doesn't respect the Zen of Python, but it's also quite 
> counterintuitive
> because as explained in the PEP, the behavior of import os.path as p is not
> the same than import os.path, while from os import path as p is
> quite consistent with or without as.
> There is one case where import ... as ... is consistent (and justified IMHO),
> that's for statements like import _thread as thread, only the imported object
> is aliased (as from ... import ... as ... do).
> Looking at the standard library, only few dozens of lines match the regex 
> ^import
> \w+\.(\w|\.)+ as, while the other (equivalent) form has hundreds of matches.
> That's why I propose to restrict the aliased import statement (import ... as
> ...) to not be able to alias imported submodule, letting from ... import ...
> as ... statement be the only to do it.
> The roadmap could be to depreciate the statement with a warning in a few next 
> releases, to
> remove finally remove the syntax.
> (hoping my English is okay)

`import ...` and `from ... import ...` does not behave in the same manner as it 
is explained in docs: 
https://docs.python.org/3/reference/simple_stmts.html#import. So they are not 
equivalent statements.

`import os.path as p` and `from os import path as p` bind the same local name 
to the same object, that is true. However, they do in a quite different manner. 
And this difference can be relevant, for instance, when are dealing with 
circular imports (ok, I cannot remember any example of this right now).

So I do not see how they are violating any principle in PEP 20 "The Zen of 
Python". Anyway, The Zen of Python is an inspirational document, not a law. 
Even it it was the law, any law has its exceptions and PEP 221 "Import As" 
presents and explains one useful exception. In my opinion...

Thank you.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MGLG5YJMDLDU3LHCJP2POCVFW33342LB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Limit 'import as' syntax

2020-03-30 Thread Chris Angelico
On Tue, Mar 31, 2020 at 12:12 AM  wrote:
> That's why I propose to restrict the aliased import statement (`import ... as 
> ...`) to not be able to alias imported submodule, letting `from ... import 
> ... as ...` statement be the only to do it.
> The roadmap could be to depreciate the statement with a warning in a few next 
> releases, to remove finally remove the syntax.
> (hoping my English is okay)
>

Can you elaborate a bit more on what the advantage of restricting and
removing this is?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JEMHSKVVRHMUI5DXQ3NBIFUBRZVWKL27/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Limit 'import as' syntax

2020-03-30 Thread joperez
There should be one-- and preferably only one --obvious way to do it. (from The 
Zen of Python https://www.python.org/dev/peps/pep-0020/)
However, in something as basic as import syntax, that's not the case. This 
example comes from PEP 221 (https://www.python.org/dev/peps/pep-0221/) :
A slightly special case exists for importing sub-modules. The statement
`import os.path`
stores the module os locally as os, so that the imported submodule path is 
accessible as os.path. As a result,
`import os.path as p`
stores os.path, not os, in p. This makes it effectively the same as
`from os import path as p`

Not only it doesn't respect the Zen of Python, but it's also quite 
counterintuitive because as explained in the PEP, the behavior of `import 
os.path as p` is not the same than `import os.path`, while `from os import path 
as p` is quite consistent with or without `as`.
There is one case where `import ... as ...` is consistent (and justified IMHO), 
that's for statements like `import _thread as thread`, only the imported object 
is aliased (as `from ... import ... as ...` do).

Looking at the standard library, only few dozens of lines match the regex 
`^import \w+\.(\w|\.)+ as`, while the other (equivalent) form has hundreds of 
matches.

That's why I propose to restrict the aliased import statement (`import ... as 
...`) to not be able to alias imported submodule, letting `from ... import ... 
as ...` statement be the only to do it.
The roadmap could be to depreciate the statement with a warning in a few next 
releases, to remove finally remove the syntax.
(hoping my English is okay)
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/K52ADPFC6YTLPAU4EIUGZYNWYZYWNKK4/
Code of Conduct: http://python.org/psf/codeofconduct/