[Python-Dev] Re: PEP 467 feedback from the Steering Council
On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker wrote: > On Tue, Aug 10, 2021 at 3:00 PM wrote: > >> The history of bytes/bytearray is a dual-purpose view. It can be used in >> a string-like way to emulate Python 2 string handling (hence all the usual >> string methods and a repr that displays in a string-like fashion). It can >> also be used as an array of numbers, 0 to 255 (hence the list methods and >> having an iterator of ints). ISTM that the authors of this PEP reject or >> want to discourage the latter use cases. >> > > I didn't read it that way, but if so, please no, I"d rather see the former > use cases discouraged. ISTM that the Py2 string handling is still needed > for working with mixed binary / text data -- but that should be a pretty > specialized use case. spelling the way to create a byte, byte() sure makes > more sense in any other context. > > >> ... anything where a C programmer would an array of unsigned chars). >> > > or any programmer would use an array of unsigned 8bit integers :-) numpy > spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. > My point is that for anyone not an "old time" C programmer, or even a > Python2 programmer, the "character is an unsigned 8 bit int" concept is > alien and confusing, not a helpful mnemonic. > > >> For example, creating a single byte with bytes([0x1f]) isn't pleasant, >> obvious, or fast. >> > > no, though bytes([31]) isn't horrible ;-) (despite coding for over four > decades, I'm still not comfortable with hex notation) > > I say it's not horrible, because bytes is a Sequence of bytes (or integer > values between 0 and 255), initializing it with an iterable seems pretty > reasonable, that's how we initialize most (all?) other sequences after all. > And compatible with array.array and numpy arrays. > I consider bytes([31]) notation to be horrible API design because a simple easy to make typo of omitting the [] or using () and forgetting the tupleizing comma turns it into a different valid call with an entirely different meaning. bytes([31]) vs bytes((31)) vs bytes(31). It's also ugly to anyone who thinks about what bytecode is generated and executed in order to do it. an entire new list object with a single element referring to a tiny int is created and destroyed just to create a b'\037' object? An optimizer pass to fix that up at the bytecode level isn't easy as it can only be done when it can prove that `bytes` has not been reassigned to something other than the builtin. Near impossible in a lot of code. bytes.fromint(31) isn't much better in the bytecode regard, but at least a temporary list is not being created. As much as I think that bytes(size: int) was a bad idea to have as an API - bytearray(size: int) is fine and useful as it is mutable - that ship sailed and getting rid of it would break some odd code. It doesn't have much use, so adding fromsize(size: int) methods don't sound very compelling as it just adds yet another way to do the same thing. we should just live with that specific wart. `bchr` as a builtin... I'm with the others on saying no to any new builtin that isn't expected to see frequent use. bchr won't see frequent use. `bytes.fromint` seems fine. others are proposing `bytes.byte` for that. I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read. It falls victim to API stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a reader in English as the difference is a subtle "s". "bytes dot from int" or "bytes from int" is quite clear. (avoiding stuttering in API design was popularized by golang - it's a good thing to strive for in any language) It's times like this that i wish Python had chosen consistent camelCase, CapWords, or snake_case in all API names as conjoinedwords aren't great. But they are sadly consistent with our past sins. One thing never mentioned in the PEP. If you expect a primary use of the fromint (aka bchr builtin that isn't going to happen) to be called on constant values often. Why are we adding name lookups and function calls to this? Why not address the elephant in the room and allow for decimal values to be written as an escape sequence within bytes literals? b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255 with no leading zero should be accepted when parsing such an escape. (Do not bother adding the same feature for codepoints in unicode strs; leave that to later if someone shows actual demand). This can't address the bytearray need, but that's been true of bytearray for ages, a common way to create them is via a copy from transient bytes objects. bytearray(b'\d31') isn't much different than bytearray.fromint(31). one less name lookup. Why not add a \d escape? Introducing a new escape is fraught with peril as existing \d's within b'' literals in code could change meaning. backwards compatibility fail. But one that is easy to check for with
[Python-Dev] Re: PEP 467 feedback from the Steering Council
Hm, I don’t think the major use for bchr() will be with a constant. On Sun, Aug 22, 2021 at 14:48 Gregory P. Smith wrote: > > On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker > wrote: > >> On Tue, Aug 10, 2021 at 3:00 PM wrote: >> >>> The history of bytes/bytearray is a dual-purpose view. It can be used >>> in a string-like way to emulate Python 2 string handling (hence all the >>> usual string methods and a repr that displays in a string-like fashion). >>> It can also be used as an array of numbers, 0 to 255 (hence the list >>> methods and having an iterator of ints). ISTM that the authors of this PEP >>> reject or want to discourage the latter use cases. >>> >> >> I didn't read it that way, but if so, please no, I"d rather see the >> former use cases discouraged. ISTM that the Py2 string handling is still >> needed for working with mixed binary / text data -- but that should be a >> pretty specialized use case. spelling the way to create a byte, byte() sure >> makes more sense in any other context. >> >> >>> ... anything where a C programmer would an array of unsigned chars). >>> >> >> or any programmer would use an array of unsigned 8bit integers :-) numpy >> spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. >> My point is that for anyone not an "old time" C programmer, or even a >> Python2 programmer, the "character is an unsigned 8 bit int" concept is >> alien and confusing, not a helpful mnemonic. >> >> >>> For example, creating a single byte with bytes([0x1f]) isn't pleasant, >>> obvious, or fast. >>> >> >> no, though bytes([31]) isn't horrible ;-) (despite coding for over four >> decades, I'm still not comfortable with hex notation) >> >> I say it's not horrible, because bytes is a Sequence of bytes (or integer >> values between 0 and 255), initializing it with an iterable seems pretty >> reasonable, that's how we initialize most (all?) other sequences after all. >> And compatible with array.array and numpy arrays. >> > > I consider bytes([31]) notation to be horrible API design because a simple > easy to make typo of omitting the [] or using () and forgetting the > tupleizing comma turns it into a different valid call with an entirely > different meaning. bytes([31]) vs bytes((31)) vs bytes(31). > > It's also ugly to anyone who thinks about what bytecode is generated and > executed in order to do it. an entire new list object with a single > element referring to a tiny int is created and destroyed just to create a > b'\037' object? An optimizer pass to fix that up at the bytecode level > isn't easy as it can only be done when it can prove that `bytes` has not > been reassigned to something other than the builtin. Near impossible in a > lot of code. bytes.fromint(31) isn't much better in the bytecode regard, > but at least a temporary list is not being created. > > As much as I think that bytes(size: int) was a bad idea to have as an API > - bytearray(size: int) is fine and useful as it is mutable - that ship > sailed and getting rid of it would break some odd code. It doesn't have > much use, so adding fromsize(size: int) methods don't sound very compelling > as it just adds yet another way to do the same thing. we should just live > with that specific wart. > > `bchr` as a builtin... I'm with the others on saying no to any new builtin > that isn't expected to see frequent use. bchr won't see frequent use. > > `bytes.fromint` seems fine. others are proposing `bytes.byte` for that. > I don't *like* to argue over names (the last stage of anything) but I do > need to point out how that sounds to read. It falls victim to API > stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a > reader in English as the difference is a subtle "s". "bytes dot from int" > or "bytes from int" is quite clear. (avoiding stuttering in API design was > popularized by golang - it's a good thing to strive for in any language) > It's times like this that i wish Python had chosen consistent camelCase, > CapWords, or snake_case in all API names as conjoinedwords aren't great. > But they are sadly consistent with our past sins. > > One thing never mentioned in the PEP. If you expect a primary use of the > fromint (aka bchr builtin that isn't going to happen) to be called on > constant values often. Why are we adding name lookups and function calls > to this? Why not address the elephant in the room and allow for decimal > values to be written as an escape sequence within bytes literals? > > b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255 > with no leading zero should be accepted when parsing such an escape. (Do > not bother adding the same feature for codepoints in unicode strs; leave > that to later if someone shows actual demand). This can't address the > bytearray need, but that's been true of bytearray for ages, a common way to > create them is via a copy from transient bytes objects. bytearray(b'\d31') > isn't much different t
[Python-Dev] Re: PEP 667: Consistent views of namespaces
On Sat, Aug 21, 2021 at 8:52 PM Nick Coghlan wrote: > > On Sun, 22 Aug 2021, 10:47 am Guido van Rossum, wrote: > >> >> Everything here is about locals() and f_locals in *function scope*. (I >> use f_locals to refer to the f_locals field of frame objects as seen from >> Python code.) And in particular, it is about what I'll call "extra >> variables": the current CPython feature that you can add *new* variables to >> f_locals that don't exist in the frame, for example: >> >> def foo(): >> x = 1 >> locals()["y"] = 2 # or sys._getframe()["y"] = 2 >> >> My first reaction was to propose to drop this feature, but I realize it's >> kind of important for debuggers to be able to execute arbitrary code in >> function code -- assignments to locals should affect the frame, but it >> should also be possible to create new variables (e.g. temporaries). So I >> agree we should keep this. >> > > I actually tried taking this feature out in one of the PEP 558 drafts, but > actually doing so breaks the pdb test suite. > I wonder if we should reconsider this, given that so much of the complexity of the competing PEPs is due to this issue of "extra" variables. We can fix pdb, and other debuggers probably will be happy to make some changes (debuggers often are closely tied to the implementation anyway -- e.g. PyDev currently seems broken with 3.11). One way to fix it would be to have the debugger use a mapping implementation that acts as a proxy for f_locals, but stores extra variables in its own storage. Maybe this just moves the problem, but I feel support for extra variables will always be a bit of a wart, and many uses of f_locals don't need them. Another thing I feel we should at least have a good second look at is the "locals() returns a snapshot" behavior. This is the same in both PEPs but it is inconsistent with module and class scopes, as well as different from 3.10. I wonder if we're valuing "does it return the same type" too much over "does it exhibit the same high-level (conceptual) behavior" here. Yes, a snapshot is a dict, just like what you get from locals() in class and module scopes. But no, that dict is not an alias for the actual contents of the scope. If we made locals() return the same proxy that f_locals gives, it's no longer a dict, but it has the same *conceptual* behavior (maybe: "meaning") as for the other types of scopes. > So apparently the key difference of opinion between Mark and Nick is about >> f_locals, and what to do with extras. In Nick's proposal when you reference >> f.f_locals twice in a row (for the same frame object f), you get the same >> proxy object, whereas in Mark's proposal you get a different object each >> time, but it doesn't matter, because the proxy has no state other than a >> reference to the frame. >> > > If PEP 558 is still giving that impression, I need to fix the wording - > the proxy objects are ephemeral in both PEPs (the 558 text is slightly > behind the implementation on that point, as the fast refs mapping is now > stored on the frame object, so it only needs to be built once) > Ah, I didn't actually find a clear indication one way or another. If your proposal *also* makes f.f_locals return a new object on each use, the difference between the two proposals really is entirely in the C API, as you bring up below. > > In Mark's proposal, if you assign a value to an extra variable, it gets >> stored in a hidden dict field on the frame, and when you read the proxy, >> the contents of that hidden dict field gets included. This hidden dict >> lazily created on the first store to an extra variable. (Mark shows >> pseudo-code to clarify this; the hidden dict is stored as _extra_locals on >> the frame.) >> > > PEP 558 works essentially the same way, the difference is that it uses the > existing locals dict storage rather than adding new storage just for > optimised frames. > Oh, you're right. So then this doesn't really matter -- if there's no other use for the C-level field f_locals in a function scope, then we might as well use that to store the extras (assuming its NULL-ness isn't used as a flag for some other purpose). To be clear, I *think* that for a function scope where the f_locals property has never been used and locals() has never been called, the C-level f_locals field is NULL. > In Nick's proposal, there's a cache on the frame that stores both the >> extras and the proper variables. This cache can get out of sync with the >> contents of the proper variables when some bytecode is executed (for >> performance reasons we don't want the bytecode to keep the cache up to date >> on every store), so there's an operation to sync the frame cache >> (sync_frame_cache(), it's not defined in which namespace this exists -- is >> it a builtin or in sys?). >> > > It's an extra method on the proxy objects. You only need it if you keep an > old proxy object around - if you always retrieve a new proxy object after > executing Python code, that proxy will refresh
