[Python-Dev] Re: Memory address vs serial number in reprs
On Sun, Jul 19, 2020 at 1:34 PM Thomas Moreau wrote: > While it would be nice to have simpler identifiers for objects, it would > be hard to make it work for multiprocessing, as objects in different > interpreter would end up having the same repr. Shared objects (locks) might > also have different serial numbers depending on how many objects have been > created before it is communicated to the child process. > Adding to what was said here, there are serious implications outside of the multiprocessing case, too... 1) In a multi-threaded Python, threads will need to contend over a per-type counter, serializing the allocation of those counted types. 2) In a Python with tagged immediates (like fixnums, etc.) the added space cost would disqualify counted types from being implemented as an immediate value. This would force counted types to be heap-allocated and suffer from the aforementioned serialization. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HZQ2XIH3FAX44Q3ZXJIUQ7XOPYNKT7MN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 622 aspects
On Sun, Jul 19, 2020 at 3:00 PM Tobias Kohn wrote: > Quoting Koos Zevenhoven : > > > (1) Class pattern that does isinstance and nothing else. > > > > If I understand the proposed semantics correctly, `Class()` is > equivalent to checking `isinstance(obj, Class)`, also when `__match_args__` > is not present. However, if a future match protocol is allowed to override > this behavior to mean something else, for example `Class() == obj`, then > the plain isinstance checks won't work anymore! I do find `Class() == obj` > to be a more intuitive and consistent meaning for `Class()` than plain > `isinstance` is. > > > > Instead, the plain isinstance check would seem to be well described by a > pattern like `Class(...)`. This would allow isinstance checks for any > class, and there is even a workaround if you really want to refer to the > Ellipsis object. This is also related to the following point. > > > > (2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)` > > > > In the proposed semantics, cases like this are equivalent. I can see why > that is desirable in many cases, although Class(x=1, ...)` would make it > more clear. A possible improvement might be to add an optional element to > `__match_args__` that separates optional arguments from required ones > (although "optional" is not the same as "don't care"). > > > Please let me answer these two questions in reverse order, as I think it > makes more sense to tackle the second one first. > Possibly. Although I do find (1) a more serious issue than (2). To not have isinstance available by default in a consistent manner would definitely be a problem in my opinion. But the way I proposed to solve (1) may affect the user interpretations of (2). > ***2. Attributes*** > > There actually is an important difference between `Class(x=1, y=_)` and ` > Class(x=1)` and it won't do to just write `Class(x=1,...)` instead. The > form `Class(x=1, y=_)` ensures that the object has an attribute `y`. In > a way, this is where the "duck typing" is coming in. > Ok, that is indeed how the current class pattern match algorithm works according to the current PEP 622. Let me rephrase the title of problem (2) slightly to accommodate for this: "(2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)` (when the object has attributes x, y and "x", "y" are in __match_arhs__)" > The class of an object and its actual shape (i.e. the set of attributes it > has) are rather loosely coupled in Python: there is usually nothing in the > class itself that specifies what attributes an object has (other than the > good sense to add these attributes in `__init__`). > Usually, it is bad practice to define classes whose interface is not or cannot be specified. Python does, however, even allow you to make hacks like tack an extra attribute to an object while it doesn't really "belong" there. > Conceptually, it therefore makes sense to not only support `isinstance` > but also `hasattr`/`getattr` as a means to specify the shape/structure of > an object. > > Here we agree (although not necessarily regarding "therefore"). > Let me give a very simple example from Python's `AST` module. We know > that compound statements have a field `body` (for the suite) and possibly > even a field `orelse` (for the `else` part). But there is no common > superclass for compound statements. Hence, although it is shared by > several objects, you cannot detect this structure through `isinstance` > alone. By allowing you to explicitly specify attributes in patterns, you > can still use pattern matching notwithstanding: > ``` > *match* node: > *case* ast.stmt(body=suite, orelse=else_suite) if else_suite: > # a statement with a non-empty else-part > ... > *case* ast.stmt(body=suite): > # a compound statement without else-part > ... > *case* ast.stmt(): > # a simple statement > ... > ``` > So this is an example of a combination of duck-typing and a class type. I agree it's good to be able to have this type of matching available. I can only imagine the thought process that led you to bring up this example, but I feel that we got stuck on whether an attribute is present or not, which is a side track regarding the issues I pointed out. Python can be written in many ways, but I'm not sure that the above example is representative of how duck typing usually works. I see a lot more situations where you either care about isinstance or about some duck typing pattern – usually not both. > The very basic form of class patterns could be described as `C(a_1=P_1, > a_2=P_2, ...)`, where `C` is a class to be checked through `isinstance`, > and the `a_i` are attribute names to be extracted by means of `getattr` > to then be matched against the subpatterns `P_i`. In short: you specify > the structure not only by class, but also by its actual structure in form > of required attributes. > Ok, back on track now. But this won't do, if we want to be able to access is
[Python-Dev] Re: Memory address vs serial number in reprs
On 7/19/20 4:30 PM, Thomas Moreau wrote: > Dear all, > > While it would be nice to have simpler identifiers for objects, it > would be hard to make it work for multiprocessing, as objects in > different interpreter would end up having the same repr. Shared > objects (locks) might also have different serial numbers depending on > how many objects have been created before it is communicated to the > child process. > > regards > Thomas > > My guess is that these numbers are the 'id()' of the object, which as an implementation detail in CPython is the object address. If some other method was chosen for generating the object id, then by necessity, there would need to be a method to let multiple interpreters keep the number unique, perhaps some bits being reserved for an interpreter id, and the rest be a serial number. -- Richard Damon ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZUEK3LQ2PBGXG4KZ2466EDNIDGNLAWR2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Memory address vs serial number in reprs
Dear all, While it would be nice to have simpler identifiers for objects, it would be hard to make it work for multiprocessing, as objects in different interpreter would end up having the same repr. Shared objects (locks) might also have different serial numbers depending on how many objects have been created before it is communicated to the child process. regards Thomas Le dim. 19 juil. 2020 à 21:26, Antoine Pitrou a écrit : > On Sun, 19 Jul 2020 18:38:30 +0300 > Serhiy Storchaka wrote: > > I have problem with the location of hexadecimal memory address in custom > > reprs. > > > > > > > > vs > > > > > > How about putting it in parentheses, to point more clearly that it can > most of the time be ignored: > > > > > I do not propose to use serial numbers for all objects, because it would > > increase the size of objects and the fixed-size integer can be > > overflowed for some short-living objects created in mass (like numbers, > > strings, tuples). But only for some custom objects implemented in > > Python, for which size and creation time are not critical. I want to > > start with synchronization objects in threading and multiprocessing > > which did not have custom reprs, than change reprs of locks and asyncio > > objects. > > > > Is it worth to do? > > I would like it if it applied to all objects, but doing it only for > certain objects will be distracting and confusing (does the serial > number point to a specific feature? it turns out it doesn't, it's just > an arbitrary aesthetical choice). > > Regards > > Antoine. > > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/7ZSD6GHNJPS3LB74RE7OCI5J3AB642EE/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HGQWOQ6JPJ33YKG4UK2NQW2OX3BAPRZU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Memory address vs serial number in reprs
On Sun, 19 Jul 2020 18:38:30 +0300 Serhiy Storchaka wrote: > I have problem with the location of hexadecimal memory address in custom > reprs. > > > > vs > > How about putting it in parentheses, to point more clearly that it can most of the time be ignored: > I do not propose to use serial numbers for all objects, because it would > increase the size of objects and the fixed-size integer can be > overflowed for some short-living objects created in mass (like numbers, > strings, tuples). But only for some custom objects implemented in > Python, for which size and creation time are not critical. I want to > start with synchronization objects in threading and multiprocessing > which did not have custom reprs, than change reprs of locks and asyncio > objects. > > Is it worth to do? I would like it if it applied to all objects, but doing it only for certain objects will be distracting and confusing (does the serial number point to a specific feature? it turns out it doesn't, it's just an arbitrary aesthetical choice). Regards Antoine. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7ZSD6GHNJPS3LB74RE7OCI5J3AB642EE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Memory address vs serial number in reprs
That looks expensive, esp. for objects implemented in Python — an extra dict entry plus a new unique int object. What is the problem you are trying to solve for these objects specifically? Just that the hex numbers look distracting doesn’t strike me as sufficient motivation. On Sun, Jul 19, 2020 at 08:39 Serhiy Storchaka wrote: > I have problem with the location of hexadecimal memory address in custom > reprs. > > > > vs > > > > The long hexadecimal number makes the repr longer and distracts > attention from other useful information. We could get rid of it, but it > is useful if we want to distinguish objects of the same type. Although > it is hard to distinguish long hexadecimal numbers which differ only by > few digits in the middle. > > What if use serial numbers to differentiate instances? > > > > where the serial number starts with 1 and increased for every new > instance of that type. > > The advantages are: > > * Shorter repr. > * Easier to distinguish different objects. > * The serial number is unique for the life of program and cannot be > reused (in contrary to id/memory address). > > The disadvantages are: > > * Increased object size and creation time. > > I do not propose to use serial numbers for all objects, because it would > increase the size of objects and the fixed-size integer can be > overflowed for some short-living objects created in mass (like numbers, > strings, tuples). But only for some custom objects implemented in > Python, for which size and creation time are not critical. I want to > start with synchronization objects in threading and multiprocessing > which did not have custom reprs, than change reprs of locks and asyncio > objects. > > Is it worth to do? > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/E6YEXMQ4OE5YGZGRP62JOLTAGBCL6RCX/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido (mobile) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7BR5USRVLRE34KH53RPMFV43Z7TPXRH5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Memory address vs serial number in reprs
On Sun, Jul 19, 2020 at 06:38:30PM +0300, Serhiy Storchaka wrote: > What if use serial numbers to differentiate instances? I like this idea. It is similar to how Jython and IronPython object IDs work: # Jython >>> id(None) 2 >>> id(len) 3 >>> object() > I do not propose to use serial numbers for all objects, because it would > increase the size of objects and the fixed-size integer can be > overflowed for some short-living objects created in mass (like numbers, > strings, tuples). But only for some custom objects implemented in > Python, for which size and creation time are not critical. I want to > start with synchronization objects in threading and multiprocessing > which did not have custom reprs, than change reprs of locks and asyncio > objects. This sounds reasonable to me. +1 -- Steven ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDN2DF3BTU4P3N5MD5GEHUZRAT6ETGU5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Memory address vs serial number in reprs
I have problem with the location of hexadecimal memory address in custom reprs. vs The long hexadecimal number makes the repr longer and distracts attention from other useful information. We could get rid of it, but it is useful if we want to distinguish objects of the same type. Although it is hard to distinguish long hexadecimal numbers which differ only by few digits in the middle. What if use serial numbers to differentiate instances? where the serial number starts with 1 and increased for every new instance of that type. The advantages are: * Shorter repr. * Easier to distinguish different objects. * The serial number is unique for the life of program and cannot be reused (in contrary to id/memory address). The disadvantages are: * Increased object size and creation time. I do not propose to use serial numbers for all objects, because it would increase the size of objects and the fixed-size integer can be overflowed for some short-living objects created in mass (like numbers, strings, tuples). But only for some custom objects implemented in Python, for which size and creation time are not critical. I want to start with synchronization objects in threading and multiprocessing which did not have custom reprs, than change reprs of locks and asyncio objects. Is it worth to do? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E6YEXMQ4OE5YGZGRP62JOLTAGBCL6RCX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 622: Structural Pattern Matching -- followup
On 08/07/2020 16:15, MRAB wrote: On 2020-07-08 03:08, Rob Cliffe via Python-Dev wrote: Why not use '=' to distinguish binding from equality testing: case Point(x, =y): # matches a Point() with 2nd parameter equal to y; if it does, binds to x. This would allow a future (or present!) extension to other relative operators: case Point(x, >y): (although the syntax doesn't AFAICS naturally extend to specifying a range, i.e. an upper and lower bound, which might be a desirable thing to do. Perhaps someone can think of a way of doing it). Whether case =42: case 42: would both be allowed would be one issue to be decided. In Python, '=' is assignment and '==' is equality. Using '=' for equality could lead to confusion. Fair enough. In that case use `==` instead: case Point(x, ==y): # if matches a Point with the given y-value, bind to x ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KYUZQRZNDOVFEOC5XBYOFXKTPK7LAZI4/ Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NXZYPY4HERA3REEENK7OHADM3OWFVGJV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Another take on PEP 622
Hi Terry, Thank you: I really like your wave/particules analogy. I think that pattern matching is indeed uniting different concepts to build a stronger, more versatile structure. I also like your concept of a general "binding structure" with different forms, such as assignment, parameter passing, and (according to PEP 622) patterns. I feel that this categorisation puts pattern matching quite in the correct place. Of course, there is also the second aspect of "bind these variables---/if you can/" or "analyse and compare the structure", which is the other part of this particule/wave duality. Concerning the guards (optional conditions), I also think that you summarised very nearly. I tend to think of the patterns a bit like the grammar of a programming language. Something that is supposed to be static and declarative (as far as this is possible in Python). Yet, some constraints in programming languages cannot be expressed by a (context-free) grammar in a meaningful way. For instance, the grammar itself does not keep you from having two parameters in a function sharing the same name. This is a dynamic aspect, a relationship between two otherwise independent parts of the overall structure. And that is best caught by the guards. So, yes: as I see it, the guards really add something that goes beyond the declarative side of patterns. Hence, I completely agree with your characterisation. Kind regards, Tobias Quoting Terry Reedy : On 7/16/2020 9:51 PM, Tobias Kohn wrote: Hi Everyone, I feel there are still quite a few misconceptions around concerning PEP 622 and the new pattern matching feature it proposes. Please allow me therefore to take another attempt at explaining the ideas behind PEP 622 with a different approach. Bear in mind that I naturally cannot cover everything, though, and that some parts presented here are still slightly simplified. Thank you Tobias. I am a fan of looking at things from multiple viewpoint. For 200 years, physicists argued about whether light is waves or particles, before discovering that 'neither' and 'both' were more correct. 1. Function Overloading 2. Visitor Pattern and Dispatch > 3. Shape and Structure [snip, snip, snip] In an assignment statement, the code to the left of '=' is a 'target list' of 'target's, with some idiosyncratic rules. Even though it might, misleadingly, be called a 'target expression', it is not an expression to be evaluated. Similarly, the code between parentheses in a def statement is a 'parameter list' of 'defparameter' and special symbols, with other idiosyncratic rules. Both could be called 'binding lists' or more generally, 'binding structures'. To me, the important point of your point is that 'case' is somewhat analogous to 'def', and that the stuff between 'case' and ':' is a binding structure. We should call this structure a 'match structure'. It is misleading and confusing to call it a 'match expression'. A match structure consists of a 'match list' of 'match items' or 'matcher's and an optional "'if' ". Matchers have a 3rd, new, and larger set of idiosyncratic rules. The optional condition is an escape hatch for when expressing the intended match constraint and corresponding match set is difficult or worse using the match rules. As with target and parameter items, untagged simple name matchers are (and I now see, should be) binding targets. (The parameter list tags are ':' for types and '=' for default values.) Unlike assignment targets, dotted names and subscriptings are not binding targets. Like parameter lists, match lists include literals. Match lists also include syntactic structure not seen in the other binding structures. -- Terry Jan Reedy ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NNZ67OMAV2CDV7GSX64SOLUAERJSF5HP/Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6XZXGOHDBRN5ZXREZWB74DYHUGIEPLYI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: PEP 622 aspects
Hi Koos, Let me try and address some of the concerns and questions you are rising. I am replying here to two emails of yours so as to keep traffic down. Quoting Koos Zevenhoven : > (1) Class pattern that does isinstance and nothing else. If I understand the proposed semantics correctly, `Class()` is equivalent to checking `isinstance(obj, Class)`, also when `__match_args__` is not present. However, if a future match protocol is allowed to override this behavior to mean something else, for example `Class() == obj`, then the plain isinstance checks won't work anymore! I do find `Class() == obj` to be a more intuitive and consistent meaning for `Class()` than plain `isinstance` is. Instead, the plain isinstance check would seem to be well described by a pattern like `Class(...)`. This would allow isinstance checks for any class, and there is even a workaround if you really want to refer to the Ellipsis object. This is also related to the following point. (2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)` In the proposed semantics, cases like this are equivalent. I can see why that is desirable in many cases, although Class(x=1, ...)` would make it more clear. A possible improvement might be to add an optional element to `__match_args__` that separates optional arguments from required ones (although "optional" is not the same as "don't care"). Please let me answer these two questions in reverse order, as I think it makes more sense to tackle the second one first. **2. ATTRIBUTES** There actually is an important difference between `Class(x=1, y=_)` and `Class(x=1)` and it won't do to just write `Class(x=1,...)` instead. The form `Class(x=1, y=_)` ensures that the object has an attribute `y`. In a way, this is where the "duck typing" is coming in. The class of an object and its actual shape (i.e. the set of attributes it has) are rather loosely coupled in Python: there is usually nothing in the class itself that specifies what attributes an object has (other than the good sense to add these attributes in `__init__`). Conceptually, it therefore makes sense to not only support `isinstance` but also `hasattr`/`getattr` as a means to specify the shape/structure of an object. Let me give a very simple example from Python's `AST` module. We know that compound statements have a field `body` (for the suite) and possibly even a field `orelse` (for the `else` part). But there is no common superclass for compound statements. Hence, although it is shared by several objects, you cannot detect this structure through `isinstance` alone. By allowing you to explicitly specify attributes in patterns, you can still use pattern matching notwithstanding: ``` MATCH node: CASE ast.stmt(body=suite, orelse=else_suite) if else_suite: # a statement with a non-empty else-part ... CASE ast.stmt(body=suite): # a compound statement without else-part ... CASE ast.stmt(): # a simple statement ... ``` The very basic form of class patterns could be described as `C(a_1=P_1, a_2=P_2, ...)`, where `C` is a class to be checked through `isinstance`, and the `a_i` are attribute names to be extracted by means of `getattr` to then be matched against the subpatterns `P_i`. In short: you specify the structure not only by class, but also by its actual structure in form of required attributes. Particularly for very simple objects, it becomes annoying to specify the attribute names each time. Take, for instance, the `Num`-expression from the AST. It has just a single field `n` to hold the actual number. But the AST objects also contain an attribute `_fields = ('n',)` that not only lists the *relevant* attributes, but also specifies an order. It thus makes sense to introduce a convention that in `Num(x)` without argument name, the `x` corresponds to the first field `n`. Likewise, you write `UnarOp('+', item)` without the attribute names because `_fields=('op', 'operand')` already tells you what attributes are meant. That is essentially the principle we adopted through introduction of `__match_args__`. **1. MATCH PROTOCOL** I am not entirely sure what you mean by `C() == obj`. In most cases you could not actually create an instance of `C` without some meaningful arguments for the constructor. The idea of the match-protocol is very similar to how you can already override the behaviour of `isinstance`. It is not meant to completely change the semantics of what is already there, but to allow you to customise it (in some exciting ways ^_^). Of course, as with everything customisable, you could go off and do something funny with it, but if it then breaks, that's quite on you. On the caveat that this is **NOT PART OF THIS PEP (!)**, let me try and explain why we would consider a match protocol in the first place. The s