Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Stephen J. Turnbull wrote: So what does the 1/0 that occurs in [1/x for x in range(-5, 6)] mean? In what sense is it equal to itself? How can something which is not a number be compared for numerical equality? I would say it *can't* be compared for *numerical* equality. It might make sense to compare it using some other notion of equality. One of the problems here, I think, is that Python only lets you define one notion of equality for each type, and that notion is the one that gets used when you compare collections of that type. (Or at least it's supposed to, but the identity- implies-equality shortcut that gets taken in some places interferes with that.) So if you're going to decide that it doesn't make sense to compare undefined numeric quantities, then it doesn't make sense to compare lists containing them either. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Guido van Rossum wrote: Currently NaN is not violating any language rules -- it is just violating users' intuition, in a much worse way than Inf does. If it's to be an official language non-rule (by which I mean that types are officially allowed to compare non-reflexively) then any code assuming that identity implies equality for arbitrary objects is broken and should be fixed. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 1:40 AM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: .. The Pythonic thing to do (in the Python 3 world at least) would be to regard NaNs as non-comparable and raise an exception. As I mentioned in a previous post, I agree in case of , =, , or = comparisons, but == and != are a harder case because you don't want, for example: [1,2,float('nan'),3].index(3) 3 to raise an exception. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 8:43 PM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 12:42 PM, Stephen J. Turnbull step...@xemacs.org wrote: Mark Dickinson writes: Declaring that 'nan == nan' should be True seems attractive in theory, No, it's intuitively attractive, but that's because humans like nice continuous behavior. In *theory*, it's true that some singularities are removable, and the NaN that occurs when evaluating at that point is actually definable in a broader context, but the point of NaN is that some singularities are *not* removable. This is somewhat Pythonic: In the presence of ambiguity, refuse to guess. Refusing to guess in this case would be to treat all NaNs as signalling NaNs, and that wouldn't be good, either :) I like Terry's suggestion for a glossary entry, and have created an updated proposal at http://bugs.python.org/issue11945 (I also noted that array.array is like collections.Sequence in failing to enforce the container invariants in the presence of NaN values) In that bug, Nick, you mention that reflexive equality is something that container classes rely on in their implementation. Such reliance seems to me to be a bug, or an inappropriate optimization, rather than a necessity. I realize that classes that do not define equality use identity as their default equality operator, and that is acceptable for items that do not or cannot have any better equality operator. It does lead to the situation where two objects that are bit-for-bit clones get separate entries in a set... exactly the same as how NaNs of different identity work... the situation with a NaN of the same identity not being added to the set multiple times seems to simply be a bug because of conflating identity and equality, and should not be relied on in container implementations. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 2:20 AM, Glenn Linderman v+pyt...@g.nevcal.com wrote: .. In that bug, Nick, you mention that reflexive equality is something that container classes rely on in their implementation. Such reliance seems to me to be a bug, or an inappropriate optimization, .. An alternative interpretation would be that it is a bug to use NaN values in lists. It is certainly nonsensical to use NaNs as keys in dictionaries and that reportedly led Java designers to forgo the nonreflexivity of nans: A NaN value is not equal to itself. However, a NaN Java Float object is equal to itself. The semantic is defined this way, because otherwise NaN Java Float objects cannot be retrieved from a hash table. - http://www.concentric.net/~ttwang/tech/javafloat.htm With the status quo in Python, it may only make sense to store NaNs in array.array, but not in a list. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 4:20 PM, Glenn Linderman v+pyt...@g.nevcal.com wrote: In that bug, Nick, you mention that reflexive equality is something that container classes rely on in their implementation. Such reliance seems to me to be a bug, or an inappropriate optimization, rather than a necessity. I realize that classes that do not define equality use identity as their default equality operator, and that is acceptable for items that do not or cannot have any better equality operator. It does lead to the situation where two objects that are bit-for-bit clones get separate entries in a set... exactly the same as how NaNs of different identity work... the situation with a NaN of the same identity not being added to the set multiple times seems to simply be a bug because of conflating identity and equality, and should not be relied on in container implementations. No, as Raymond has articulated a number of times over the years, it's a property of the equivalence relation that is needed in order to present sane invariants to users of the container. I included in the bug report the critical invariants I am currently aware of that should hold, even when the container may hold types with a non-reflexive definition of equality: assert [x] == [x] # Generalised to all container types assert not [x] != [x]# Generalised to all container types for x in c: assert x in c assert c.count(x) 0 # If applicable assert 0 = c.index(x) len(c) # If applicable The builtin types all already work this way, and that's a deliberate choice - my proposal is simply to document the behaviour as intentional, and fix the one case I know of in the standard library where we don't implement these semantics correctly (i.e. collections.Sequence). The question of whether or not float and decimal.Decimal should be modified to have reflexive definitions of equality (even for NaN values) is actually orthogonal to the question of clarifying and documenting the expected semantics of containers in the face of non-reflexive definitions of equality. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
Nick Coghlan ncoghlan at gmail.com writes: If you poke around in the test directory a bit, you may find there is already some code along these lines in other tests (e.g. I'm pretty sure the urllib tests already fire up a local server). Starting down the path of standardisation of that test functionality would be good. I have poked around, and each test module pretty much does its own thing. Perhaps that's unavoidable; I'll try and see if there are usable common patterns in the specific instances. For larger components like this, it's also reasonable to add a dedicated helper module rather than using test.support directly. I started (and Antoine improved) something along those lines with the test.script_helper module for running Python subprocesses and checking their output, although it lacks documentation and there are lots of older tests that still use subprocess directly. Yes, I thought perhaps it was too specialised for adding to test.support itself. Thanks for the feedback, Vinay ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/2011 11:54 PM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 4:20 PM, Glenn Lindermanv+pyt...@g.nevcal.com wrote: In that bug, Nick, you mention that reflexive equality is something that container classes rely on in their implementation. Such reliance seems to me to be a bug, or an inappropriate optimization, rather than a necessity. I realize that classes that do not define equality use identity as their default equality operator, and that is acceptable for items that do not or cannot have any better equality operator. It does lead to the situation where two objects that are bit-for-bit clones get separate entries in a set... exactly the same as how NaNs of different identity work... the situation with a NaN of the same identity not being added to the set multiple times seems to simply be a bug because of conflating identity and equality, and should not be relied on in container implementations. No, as Raymond has articulated a number of times over the years, it's a property of the equivalence relation that is needed in order to present sane invariants to users of the container. I probably wasn't around when Raymond did his articulation :) Sorry for whatever amount of rehashing I'm doing here -- pointers to some of the articulation would be welcome, but perhaps the summary below is intended to recap the results of such discussions. If my comments below seem to be grasping the essence of those discussions, then no need for the pointers... if I'm way off, I'd like to read a thread or two. I included in the bug report the critical invariants I am currently aware of that should hold, even when the container may hold types with a non-reflexive definition of equality: assert [x] == [x] # Generalised to all container types assert not [x] != [x]# Generalised to all container types for x in c: assert x in c assert c.count(x) 0 # If applicable assert 0= c.index(x) len(c) # If applicable The builtin types all already work this way, and that's a deliberate choice - my proposal is simply to document the behaviour as intentional, and fix the one case I know of in the standard library where we don't implement these semantics correctly (i.e. collections.Sequence). The question of whether or not float and decimal.Decimal should be modified to have reflexive definitions of equality (even for NaN values) is actually orthogonal to the question of clarifying and documenting the expected semantics of containers in the face of non-reflexive definitions of equality. Yes, I agree they are orthogonal questions... separate answers and choices can be made for specific classes, just like some classes implement equality using identity, it would also be possible to implement identity using equality, and it is possible to conflate the two as has apparently been deliberately done for Python containers, without reflecting that in the documentation. If the containers have been deliberately implemented in that way, and it is not appropriate to change them, then more work is needed in the documentation than just your proposed Glossary definition, as the very intuitive descriptions in the Comparisons section are quite at odds with the current implementation. Without having read the original articulations by Raymond or any discussions of the pros and cons, it would appear that the above list of invariants, which you refer to as sane, are derived from a pre-NaN or reflexive equality perspective; while some folk perhaps think the concept of NaN is a particular brand of insanity, it is a standard brand, and therefore worthy of understanding and discussion. And clearly, if the NaN perspective is intentionally corralled in Python, then the documentation needs to be clarified. On the other hand, the SQL language has embraced the same concept as NaN in its concept of NULL, and has pushed that concept (they call it three-valued logic, I think) clear through the language. NULL == NULL is not True, and it is not False, but it is NULL. Of course, the language is different in other ways that Python; values are not objects and have no identity, but they do have collections of values called tuples, columns, and tables, which are similar to lists and lists of lists. And they have mappings called indexes. And they've made it all work with the concept of NULL and three-valued logic. And sane people work with database systems built around such concepts. So I guess I reject the argument that the above invariants are required for sanity. On the other hand, having not much Python internals knowledge as yet, I'm in no position to know how seriously things would break internally should a different set of invariants that embrace and extend the concept of non-reflexive equality were to be invented to replace the above, nor whether there is a compatible migration path to achieve it in a reasonable manner...
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 2:54 AM, Nick Coghlan ncogh...@gmail.com wrote: .. No, as Raymond has articulated a number of times over the years, it's a property of the equivalence relation that is needed in order to present sane invariants to users of the container. I included in the bug report the critical invariants I am currently aware of that should hold, even when the container may hold types with a non-reflexive definition of equality: assert [x] == [x] # Generalised to all container types assert not [x] != [x] # Generalised to all container types for x in c: assert x in c assert c.count(x) 0 # If applicable assert 0 = c.index(x) len(c) # If applicable It is an interesting question of what sane invariants are. Why you consider the invariants that you listed essential while say if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Can you give examples of algorithms that would break if one of your invariants is violated, but would still work if the data contains NaNs? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 5:27 PM, Glenn Linderman v+pyt...@g.nevcal.com wrote: Without having read the original articulations by Raymond or any discussions of the pros and cons, In my first post to this thread, I pointed out the bug tracker item (http://bugs.python.org/issue4296) that included the discussion of restoring this behaviour to the 3.x branch, after it was inadvertently removed. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] the role of assert in the standard library ?
Hello I removed some assert calls in distutils some time ago because the package was not behaving correctly when people were using Python with the --optimize flag. In other words, assert became a full part of the code logic and removing them via -O was changing the behavior. In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) So, I grepped the stdlib for assert calls, and I have found 177 of them and many of them are making Python acts differently depending on the -O flag, Here's an example on a randomly picked assert in the threading module: import threading class test(threading.Thread): def __init__(self): self.bla = 1 def run(self): print('running') t = test() print(t) The __repr__ method is not behaving the same way depending on the O flag: $ python3 -O test.py Traceback (most recent call last): File test.py, line 12, in module print(t) File /usr/local/lib/python3.2/threading.py, line 652, in __repr__ if self._started.is_set(): AttributeError: 'test' object has no attribute '_started' $ python3 test.py Traceback (most recent call last): File test.py, line 12, in module print(t) File /usr/local/lib/python3.2/threading.py, line 650, in __repr__ assert self._initialized, Thread.__init__() was not called AttributeError: 'test' object has no attribute '_initialized' $ python test.py Traceback (most recent call last): File test.py, line 12, in module print(t) File /usr/lib/python2.6/threading.py, line 451, in __repr__ assert self.__initialized, Thread.__init__() was not called AssertionError: Thread.__init__() was not called --- oops different error $ python -O test.py Traceback (most recent call last): File test.py, line 12, in module print(t) File /usr/lib/python2.6/threading.py, line 453, in __repr__ if self.__started.is_set(): AttributeError: 'test' object has no attribute '_Thread__started' I have seen some other places where thing would simply break with -O. Am I right thinking we should do a pass on those and remove them or turn them into exception that are triggered with -O as well ? This flag is meant to optimize generated bytecode slightly, but I am not sure this involves also slightly changing the way the code behaves Cheers Tarek -- Tarek Ziadé | http://ziade.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 5:30 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Thu, Apr 28, 2011 at 2:54 AM, Nick Coghlan ncogh...@gmail.com wrote: .. No, as Raymond has articulated a number of times over the years, it's a property of the equivalence relation that is needed in order to present sane invariants to users of the container. I included in the bug report the critical invariants I am currently aware of that should hold, even when the container may hold types with a non-reflexive definition of equality: assert [x] == [x] # Generalised to all container types assert not [x] != [x] # Generalised to all container types for x in c: assert x in c assert c.count(x) 0 # If applicable assert 0 = c.index(x) len(c) # If applicable It is an interesting question of what sane invariants are. Why you consider the invariants that you listed essential while say if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. The correct assertion under Python's current container semantics is: if list(c1) == list(c2): # Make ordering assumption explicit assert all(x is y or x == y for x,y in zip(c1, c2)) # Enforce reflexivity Meyer is a purist - sticking with the mathematical definition of equality is the sort of thing that fits his view of the world and what Eiffel should be, even if it hinders interoperability with other languages and tools. Python tends to be a bit more pragmatic about things, in particular when it comes to interoperability, so it makes sense to follow IEEE754 and the decimal specification at the individual comparison level. However, we can contain the damage to some degree by specifying that containers should enforce reflexivity where they need it. This is already the case at the implementation level (collections.Sequence aside), it just needs to be pushed up to the language definition level. Can you give examples of algorithms that would break if one of your invariants is violated, but would still work if the data contains NaNs? Sure, anything that cares more about objects than it does about values. The invariants are about making containers behave like containers as far as possible, even in the face of recalcitrant types like IEEE754 floating point. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 04/28/2011 04:31 AM, Stephen J. Turnbull wrote: Are you saying you would expect that nan = float('nan') a = [1, ..., 499, nan, 501, ..., 999]# meta-ellipsis, not Ellipsis a == a False ?? I would expect l1 == l2, where l1 and l2 are both lists, to be semantically equivalent to len(l1) == len(l2) and all(imap(operator.eq, l1, l2)). Currently it isn't, and that was the motivation for this thread. If objects that break reflexivity of == are not allowed, this should be documented, and such objects banished from the standard library. Hrvoje ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 3:57 AM, Nick Coghlan ncogh...@gmail.com wrote: .. It is an interesting question of what sane invariants are. Why you consider the invariants that you listed essential while say if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. AFAIK, IEEE754 says nothing about comparison of containers, so my invariant cannot violate it. What you probably wanted to say is that my invariant cannot be achieved in the presence of IEEE754 conforming floats, but this observation by itself does not make my invariant less important than yours. It just makes yours easier to maintain. The correct assertion under Python's current container semantics is: if list(c1) == list(c2): # Make ordering assumption explicit assert all(x is y or x == y for x,y in zip(c1, c2)) # Enforce reflexivity Being correct is different from being important. What practical applications of lists containing NaNs do this and your other invariants enable? I think even with these invariants in place one should either filter out NaNs from their lists or replace them with None before doing applying container operations. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On 4/28/2011 3:54 AM, Tarek Ziadé wrote: Hello I removed some assert calls in distutils some time ago because the package was not behaving correctly when people were using Python with the --optimize flag. In other words, assert became a full part of the code logic and removing them via -O was changing the behavior. In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) My understanding is that assert can be used in production code but only to catch logic errors by testing supposed invariants or postconditions. It should not be used to test usage errors, including preconditions. In other words, assert presence or absence should not affect behavior unless the code has a bug. So, I grepped the stdlib for assert calls, and I have found 177 of them and many of them are making Python acts differently depending on the -O flag, Here's an example on a randomly picked assert in the threading module: This, to me is wrong: def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, verbose=None): assert group is None, group argument must be None for now That catches a usage error and should raise a ValueError. This def _wait(self, timeout): if not self._cond.wait_for(lambda : self._state != 0, timeout): #timed out. Break the barrier self._break() raise BrokenBarrierError if self._state 0: raise BrokenBarrierError assert self._state == 1 appears to be, or should be, a test of a postcondition that should *always* be true regardless of usage. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/2011 12:32 AM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 5:27 PM, Glenn Lindermanv+pyt...@g.nevcal.com wrote: Without having read the original articulations by Raymond or any discussions of the pros and cons, In my first post to this thread, I pointed out the bug tracker item (http://bugs.python.org/issue4296) that included the discussion of restoring this behaviour to the 3.x branch, after it was inadvertently removed. Sure. I had read that. It was mostly discussing it from a backward compatibility perspective, although it mentioned some invariants as well, etc. But mentioning the invariants is different than reading discussion about the pros and cons of such, or what reasoning lead to wanting them to be invariants. Raymond does make a comment about necessary for correctly reasoning about programs, but that is just a tautological statement based on previous agreement, rather than being the discussion itself, which must have happened significantly earlier. One of your replies to Alexander seems to say the same thing I was saying, though On 4/28/2011 12:57 AM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 5:30 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: Can you give examples of algorithms that would break if one of your invariants is violated, but would still work if the data contains NaNs? Sure, anything that cares more about objects than it does about values. The invariants are about making containers behave like containers as far as possible, even in the face of recalcitrant types like IEEE754 floating point. That reinforces the idea that the discussion about containers was to try to make them like containers in pre-NaN languages such as Eiffel, rather than in post-NaN languages such as SQL. It is not that one cannot reason about containers in either case, but rather that one cannot borrow all the reasoning from pre-NaN concepts and apply it to post-NaN concepts. So if one's experience is with pre-NaN container concepts, one pushes that philosophy and reasoning instead of embracing and extending post-NaN concepts. That's not all bad, except when the documentation says one thing and the implementation does something else. Your comment in that same message we can contain the damage to some degree speaks to that philosophy. Based on my current limited knowledge of Python internals, and available time to pursue figuring out whether the compatibility issues would preclude extending Python containers to embrace post-NaN concepts, I'll probably just learn your list of invariants, and just be aware that if I need a post-NaN container, I'll have to implement it myself. I suspect doing sequences would be quite straightforward, other containers less so, unless the application of concern is sufficiently value-based to permit the trick of creating a new NaN each time it is inserted into a different container. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Related to the discussion on Not a Number can I point out a few things that have not be explicitly addressed so far. The IEEE standard is about hardware and bit patterns, rather than types and values so may not be entirely appropriate for high-level language like Python. NaN is *not* a number (the clue is in the name). Python treats it as if it were a number: import numbers isinstance(nan, numbers.Number) True Can be read as 'Not a Number' is a Number ;) NaN does not have to be a float or a Decimal. Perhaps it should have its own class. The default comparisons will then work as expected for collections. (No doubt, making NaN a new class will cause a whole new set of problems) As pointed out by Meyer: NaN == NaN is False is no more logical than NaN != NaN is False Although both NaN == NaN and NaN != NaN could arguably be a maybe value, the all important reflexivity (x == x is True) is effectively part of the language. All collections rely on it and Python wouldn't be much use without dicts, tuples and lists. To summarise: NaN is required so that floating point operations on arrays and lists do not raise unwanted exceptions. NaN is Not a Number (therefore should be neither a float nor a Decimal). Making it a new class would solve some of the problems discussed, but would create new problems instead. Correct behaviour of collections is more important than IEEE conformance of NaN comparisons. Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Nick Coghlan wrote: Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. Aren't you making something of a circular argument here? You're saying that non-reflexive comparisons are okay because they don't interfere with certain critical invariants. But you're defining those invariants as the ones that don't happen to conflict with non-reflexive comparisons! -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Mark Shannon wrote: NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Perhaps, but that wouldn't solve anything on its own. If this new class compares reflexively, then it still violates IEE754. Conversely, existing NaNs could be made to compare reflexively without making them a new class. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 6:30 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Thu, Apr 28, 2011 at 3:57 AM, Nick Coghlan ncogh...@gmail.com wrote: .. It is an interesting question of what sane invariants are. Why you consider the invariants that you listed essential while say if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. AFAIK, IEEE754 says nothing about comparison of containers, so my invariant cannot violate it. What you probably wanted to say is that my invariant cannot be achieved in the presence of IEEE754 conforming floats, but this observation by itself does not make my invariant less important than yours. It just makes yours easier to maintain. No, I meant what I said. Your assertion includes a direct comparison between values (the x == y part) which means that IEEE754 has a bearing on whether or not it is a valid assertion. Every single one of my stated invariants consists solely of relationships between containers, or between a container and its contents. This keeps them all out of the domain of IEEE754 since the *container implementers* get to decide whether or not to factor object identity into the management of the container contents. The core containment invariant is really only this one: for x in c: assert x in c That is, if we iterate over a container, all entries returned should be in the container. Hopefully it is non-controversial that this is a sane and reasonable invariant for a container *user* to expect. The comparison invariants follow from the definition of set equivalence as: set1 == set2 iff all(x in set2 for x in set1) and all(y in set1 for y in set2) Again, notice that there is no comparison of items here - merely a consideration of the way items relate to containers. The rationale behind the count() and index() assertions is harder to define in implementation neutral terms, but their behaviour does follow naturally from the internal enforcement of reflexivity needed to guarantee that core invariant. In mathematics, this is all quite straightforward and non-controversial, since it can be taken for granted that equality is reflexive (as it's part of the definition of what equality *means* - equivalence relations *are* relations that are symmetric, transitive and reflexive. Lose any one of those three properties and it isn't an equivalence relation any more). However, when we confront the practical reality of IEEE754 floating point values and the lack of reflexivity in the presence of NaN, we're faced with a choice of (at least) 4 alternatives: 1. Deny it. Say equality is reflexive at the language level, and we don't care that it makes it impossible to fully implement IEEE754 semantics. This is what Eiffel does, and if you don't care about interoperability and the possibility of algorithmic equivalence with hardware implementations, it's probably not a bad idea. After all, why discard centuries of mathematical experience based on a decision that the IEEE754 committee can't clearly recall the rationale for, and didn't clearly document? 2. Tolerate it, but attempt to confine the breakage of mathematical guarantees to the arithmetic operations actually covered by the relevant standards. This is what CPython currently does by enforcing the container invariants at an implementation level, and, as I think it's a good way to handle the situation, this is what I am advocating lifting up to the language level through appropriate updates to the library and language reference. (Note that even changing the behaviour of float() leaves Python in this situation, since third party types will still be free to follow IEEE754. Given that, it seems relatively pointless to change the behaviour of builtin floats after all the effort that has gone into bringing them ever closer to IEEE754). 3. Signal it. We already do this in some cases (e.g. for ZeroDivisionError), and I'm personally quite happy with the idea of raising ValueError in other cases, such as when attempting to perform ordering comparisons on NaN values. 4. Embrace it. Promote NaN to a language level construct, define semantics allowing it to propagate through assorted comparison and other operations (including short-circuiting logic operators) without being coerced to True as it is now. Documenting the status quo is the *only* necessary step in all of this (and Raymond has already adopted the relevant tracker issue). There are tweaks to the current semantics that may be useful (specifically ValueError when attempting to order NaN), but changing the meaning of equality for floats probably isn't one of them (since that only fixes one type, while fixing the affected algorithms fixes *all* types).
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 7:10 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Nick Coghlan wrote: Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. Aren't you making something of a circular argument here? You're saying that non-reflexive comparisons are okay because they don't interfere with certain critical invariants. But you're defining those invariants as the ones that don't happen to conflict with non-reflexive comparisons! No, I'm taking the existence of non-reflexive comparisons as a given (despite agreeing with Meyer from a theoretical standpoint) because: 1. IEEE754 works that way 2. Even if float() is changed to not work that way, 3rd party types may still do so 3. Supporting rich comparisons makes it impossible for Python to enforce reflexivity at the language level (even if we wanted to) However, as I detailed in my reply to Antoine, the critical container invariants I cite *don't include* direct object-object comparisons. Instead, they merely describe how objects relate to containers, and how containers relate to each other, using only the two basic rules that objects retrieved from a container should be in that container and that two sets are equivalent if they are each a subset of the other. The question then becomes, how do we reconcile the container invariants with the existence of non-reflexive definitions of equality at the type level, and the answer is to officially adopt the approach already used in the standard container types: enforce reflexive equality at the container level, so that it doesn't matter that some types provide a non-reflexive version. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On 28/04/2011 09:34, Terry Reedy wrote: On 4/28/2011 3:54 AM, Tarek Ziadé wrote: Hello I removed some assert calls in distutils some time ago because the package was not behaving correctly when people were using Python with the --optimize flag. In other words, assert became a full part of the code logic and removing them via -O was changing the behavior. In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) My understanding is that assert can be used in production code but only to catch logic errors by testing supposed invariants or postconditions. It should not be used to test usage errors, including preconditions. In other words, assert presence or absence should not affect behavior unless the code has a bug. Agreed. We should ideally have buildbots doing test runs with -O and -OO. R. David Murray did a lot of work a year ago (or so) to ensure the test run passes with -OO but it easily degrades.. There are a couple of asserts in unittest (for test discovery) but I only use them to provide failure messages early. The functionality is unchanged (and tests still pass) with -OO. All the best, Michael Foord So, I grepped the stdlib for assert calls, and I have found 177 of them and many of them are making Python acts differently depending on the -O flag, Here's an example on a randomly picked assert in the threading module: This, to me is wrong: def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, verbose=None): assert group is None, group argument must be None for now That catches a usage error and should raise a ValueError. This def _wait(self, timeout): if not self._cond.wait_for(lambda : self._state != 0, timeout): #timed out. Break the barrier self._break() raise BrokenBarrierError if self._state 0: raise BrokenBarrierError assert self._state == 1 appears to be, or should be, a test of a postcondition that should *always* be true regardless of usage. -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
On Thu, Apr 28, 2011 at 7:17 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Mark Shannon wrote: NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Perhaps, but that wouldn't solve anything on its own. If this new class compares reflexively, then it still violates IEE754. Conversely, existing NaNs could be made to compare reflexively without making them a new class. And 3rd party NaNs can still do whatever the heck they want :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
On Thu, 28 Apr 2011 07:23:43 + (UTC) Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Nick Coghlan ncoghlan at gmail.com writes: If you poke around in the test directory a bit, you may find there is already some code along these lines in other tests (e.g. I'm pretty sure the urllib tests already fire up a local server). Starting down the path of standardisation of that test functionality would be good. I have poked around, and each test module pretty much does its own thing. Perhaps that's unavoidable; I'll try and see if there are usable common patterns in the specific instances. For larger components like this, it's also reasonable to add a dedicated helper module rather than using test.support directly. I started (and Antoine improved) something along those lines with the test.script_helper module for running Python subprocesses and checking their output, although it lacks documentation and there are lots of older tests that still use subprocess directly. Yes, I thought perhaps it was too specialised for adding to test.support itself. You can also take a look at Lib/test/ssl_servers.py. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
On Thu, 28 Apr 2011 07:23:43 + (UTC) Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Nick Coghlan ncoghlan at gmail.com writes: If you poke around in the test directory a bit, you may find there is already some code along these lines in other tests (e.g. I'm pretty sure the urllib tests already fire up a local server). Starting down the path of standardisation of that test functionality would be good. I have poked around, and each test module pretty much does its own thing. Perhaps that's unavoidable; I'll try and see if there are usable common patterns in the specific instances. For larger components like this, it's also reasonable to add a dedicated helper module rather than using test.support directly. I started (and Antoine improved) something along those lines with the test.script_helper module for running Python subprocesses and checking their output, although it lacks documentation and there are lots of older tests that still use subprocess directly. Yes, I thought perhaps it was too specialised for adding to test.support itself. You can take a look at Lib/test/ssl_servers.py. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 12:27 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: On 28/04/2011 09:34, Terry Reedy wrote: On 4/28/2011 3:54 AM, Tarek Ziadé wrote: Hello I removed some assert calls in distutils some time ago because the package was not behaving correctly when people were using Python with the --optimize flag. In other words, assert became a full part of the code logic and removing them via -O was changing the behavior. In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) My understanding is that assert can be used in production code but only to catch logic errors by testing supposed invariants or postconditions. It should not be used to test usage errors, including preconditions. In other words, assert presence or absence should not affect behavior unless the code has a bug. Agreed. We should ideally have buildbots doing test runs with -O and -OO. R. David Murray did a lot of work a year ago (or so) to ensure the test run passes with -OO but it easily degrades.. There are a couple of asserts in unittest (for test discovery) but I only use them to provide failure messages early. The functionality is unchanged (and tests still pass) with -OO. All the best, I'll try to add a useful report on bad asserts in the bug tracker. I am replying again to this on Python-ideas because I want to debate on assert :) Cheers Tarek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
2011/4/27 Vinay Sajip vinay_sa...@yahoo.co.uk: I've been recently trying to improve the test coverage for the logging package, and have got to a not unreasonable point: logging/__init__.py 99% (96%) logging/config.py 89% (85%) logging/handlers.py 60% (54%) where the figures in parentheses include branch coverage measurements. I'm at the point where to appreciably increase coverage, I'd need to write some test servers to exercise client code in SocketHandler, DatagramHandler and HTTPHandler. I notice there are no utility classes in test.support to help with this kind of thing - would there be any mileage in adding such things? Of course I could add test server code just to test_logging (which already contains some socket server code to exercise the configuration functionality), but rolling a test server involves boilerplate such as using a custom RequestHandler-derived class for each application. I had in mind a more streamlined approach where you can just pass a single callable to a server to handle requests, e.g. as outlined in https://gist.github.com/945157 I'd be grateful for any comments about adding such functionality to e.g. test.support. Regards, Vinay Sajip I agree having a standard server framework for tests woul be useful, because it's something which appears quite often, (e.g. when writing functional tests). See for example: http://hg.python.org/cpython/file/b452559eee71/Lib/test/test_os.py#l1316 http://hg.python.org/cpython/file/b452559eee71/Lib/test/test_ftplib.py#l211 http://hg.python.org/cpython/file/b452559eee71/Lib/test/test_ssl.py#l844 http://hg.python.org/cpython/file/b452559eee71/Lib/test/test_smtpd.py http://hg.python.org/cpython/file/b452559eee71/Lib/test/test_poplib.py#l115 Regards --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, 28 Apr 2011 09:54:23 +0200 Tarek Ziadé ziade.ta...@gmail.com wrote: I have seen some other places where thing would simply break with -O. Am I right thinking we should do a pass on those and remove them or turn them into exception that are triggered with -O as well ? Agreed. Argument checking should not depend on the -O flag. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
I am not a specialist in this area (although I call myself a mathematician). But they say that sometimes the outsider sees most of the game, or more likely that sometimes the idiot's point of view is useful. To me the idea of non-reflexive equality (an object not being equal to itself) is abhorrent. Nothing is more likely to put off new Python users if they happen to run into it. And I bet even very experienced programmers will be tripped up by it a good proportion of the time they hit it. Basically it's deferring to a wart, of dubious value, in floating point calculations and/or the IEEE754 standard, and allowing it to become a monstrous carbuncle disfiguring the whole language. I think implementations of equal/not-equal which are make equality non-reflexive (and thus break identity implies equality) should be considered broken. On 27/04/2011 15:53, Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? Right on, Guido. (A pity that a lot of people don't seem to be listening.) On 27/04/2011 17:05, Isaac Morland wrote: Python could also provide IEEE-754 equality as a function (perhaps in math), something like: def ieee_equal (a, b): return a == b and not isnan (a) and not isnan (b) Quite. If atypical behaviour is required in specialised areas, it can be coded for. (Same goes for specialised functions for comparing lists, dictionaries etc. in non-standard ways. Forced explicit is better than well-hidden implicit.) Of course, the definition of math.isnan cannot then be by checking its argument by comparison with itself Damn right - a really dirty trick if ever I saw one (not even proof against the introduction of new objects which also have the same perverse non-reflexive equality). - it would have to check the appropriate bits of the float representation. So it should. On 28/04/2011 11:11, Nick Coghlan wrote: After all, why discard centuries of mathematical experience based on a decision that the IEEE754 committe can't clearly recall the rationale for, and didn't clearly document? Sorry Nick, I have quoted you out of context - you WEREN'T arguing for the same point of view. But you express it much better than I could. It occurred to me that the very length of this thread [so far!] perfectly illustrates how controversial non-reflexive equality is. (BTW I have read, if not understood, every post to this thread and will continue to read them all.) And then I came across: On 28/04/2011 09:43, Alexander Belopolsky wrote: If nothing else, annual reoccurrence of long threads on this topic is a reason enough to reconsider which standard to follow. Aha, this is is a regular, is it? 'Nuff said! Best wishes Rob Cliffe ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Simple XML-RPC server over SSL/TLS
Hi, But what I would like to know, is if is there any reason why XML-RPC can't optionally work over TLS/SSL using Python's ssl module. I'll create a ticket, and send a patch, but I was wondering if it was a reason why this was not implemented. I think there’s no deeper reason than nobody thought about it. The ssl module is new in 2.6 and 3.x, xmlrpc is an older module for an old technology *cough*, so feel free to open a bug report. Patch guidelines are found at http://docs.python.org/devguide Thanks in advance! Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Socket servers in the test suite
Hi, I'm at the point where to appreciably increase coverage, I'd need to write some test servers to exercise client code in SocketHandler, DatagramHandler and HTTPHandler. I notice there are no utility classes in test.support to help with this kind of thing - would there be any mileage in adding such things? Of course I could add test server code just to test_logging (which already contains some socket server code to exercise the configuration functionality), but rolling a test server involves boilerplate such as using a custom RequestHandler-derived class for each application. I had in mind a more streamlined approach where you can just pass a single callable to a server to handle requests, A generic test helper to run a server for tests would be a great addition. In distutils/packaging (due to be merged into 3.3 Really Soon Now™), we also have a server, to test PyPI-related functionality. It’s a tested module providing a server class that runs in a thread, a SimpleHTTPRequest handler able to serve static files and reply to XML-RPC requests, and decorators to start and stop the server for one test method instead of a whole TestCase instance. I’m sure some common ground can be found and all these testing helpers factored out in one module. For larger components like this, it's also reasonable to add a dedicated helper module rather than using test.support directly. I started (and Antoine improved) something along those lines with the test.script_helper module for running Python subprocesses and checking their output, +1, script_helper is great. Cheers ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes issue10761: tarfile.extractall failure when symlinked files are
Hi, I’m still educating myself about concurrency and race conditions, so I hope my naïve question won’t be just a waste of time. Here it is: http://hg.python.org/cpython/rev/0c8bc3a0130a user:Senthil Kumaran orsent...@gmail.com summary: Fix closes issue10761: tarfile.extractall failure when symlinked files are present. diff --git a/Lib/tarfile.py b/Lib/tarfile.py --- a/Lib/tarfile.py +++ b/Lib/tarfile.py @@ -2239,6 +2239,8 @@ if hasattr(os, symlink) and hasattr(os, link): # For systems that support symbolic and hard links. if tarinfo.issym(): +if os.path.exists(targetpath): +os.unlink(targetpath) Is there a race condition here? Thanks Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (3.2): Closes #11858: configparser.ExtendedInterpolation and section case.
Hi, http://hg.python.org/cpython/rev/57c076ab4bbd user:Łukasz Langa luk...@langa.pl --- a/Lib/test/test_cfgparser.py +++ b/Lib/test/test_cfgparser.py @@ -20,10 +20,16 @@ def values(self): return [i[1] for i in self.items()] -def iteritems(self): return iter(self.items()) -def iterkeys(self): return iter(self.keys()) +def iteritems(self): +return iter(self.items()) + +def iterkeys(self): +return iter(self.keys()) + +def itervalues(self): +return iter(self.values()) + The dict methods in that subclass could probably be cleaned up. Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Refined time test in test_logging.
Hi, http://hg.python.org/cpython/rev/5185e1d91f3d user:Vinay Sajip vinay_sa...@yahoo.co.uk summary: Refined time test in test_logging. +ZERO = datetime.timedelta(0) + +class UTC(datetime.tzinfo): +def utcoffset(self, dt): +return ZERO + +dst = utcoffset + +def tzname(self, dt): +return 'UTC' + +utc = UTC() Any reason not to use datetime.datetime.utc here? Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 04:34 AM, Terry Reedy wrote: On 4/28/2011 3:54 AM, Tarek Ziadé wrote: Hello I removed some assert calls in distutils some time ago because the package was not behaving correctly when people were using Python with the --optimize flag. In other words, assert became a full part of the code logic and removing them via -O was changing the behavior. In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) My understanding is that assert can be used in production code but only to catch logic errors by testing supposed invariants or postconditions. It should not be used to test usage errors, including preconditions. In other words, assert presence or absence should not affect behavior unless the code has a bug. I would agree. Use asserts for this can't possibly happen wink conditions. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes issue10761: tarfile.extractall failure when symlinked files are
On Thu, Apr 28, 2011 at 04:20:06PM +0200, Éric Araujo wrote: if hasattr(os, symlink) and hasattr(os, link): # For systems that support symbolic and hard links. if tarinfo.issym(): +if os.path.exists(targetpath): +os.unlink(targetpath) Is there a race condition here? The lock to avoid race conditions (if you were thinking along those lines) would usually be implemented at the higher level code which is using extractall in threads. Checking that no one else is accessing the file before unlinking may not be suitable for the library method and of course, we cannot check if someone is waiting to act on that file. -- Senthil ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Mark Shannon wrote: Related to the discussion on Not a Number can I point out a few things that have not be explicitly addressed so far. The IEEE standard is about hardware and bit patterns, rather than types and values so may not be entirely appropriate for high-level language like Python. I would argue that the implementation of NANs is irrelevant. If NANs are useful in hardware floats -- and I think they are -- then they're just as equally useful as objects, or as strings in languages like REXX or Hypertalk where all data is stored as strings, or as quantum wave functions in some future quantum computer. NaN is *not* a number (the clue is in the name). Python treats it as if it were a number: import numbers isinstance(nan, numbers.Number) True Can be read as 'Not a Number' is a Number ;) I see your wink, but what do you make of these? class NotAnObject(object): pass nao = NotAnObject() assert isinstance(nao, object) class NotAType(object): pass assert type(NotAType) is type NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Others have already pointed out this won't make any difference. Fundamentally, the problem is that some containers bypass equality tests for identity tests. There may be good reasons for that shortcut, but it leads to problems with *any* object that does not define equality to be reflexive, not just NANs. class Null: ... def __eq__(self, other): ... return False ... null = Null() null == null False [null] == [null] True The default comparisons will then work as expected for collections. (No doubt, making NaN a new class will cause a whole new set of problems) As pointed out by Meyer: NaN == NaN is False is no more logical than NaN != NaN is False I don't agree with this argument. I think Meyer is completely mistaken there. The question of NAN equality is that of a vacuous truth, quite similar to the Present King of France: http://en.wikipedia.org/wiki/Present_King_of_France Meyer would have us accept that: The present King of France is a talking horse and The present King of France is not a talking horse are equally (pun not intended) valid. No, no they're not. I don't know much about who the King of France would be if France had a king, but I do know that he wouldn't be a talking horse. Once you accept that NANs aren't equal to anything else, it becomes a matter of *practicality beats purity* to accept that they can't be equal to themselves either. A NAN doesn't represent a specific thing. It's a signal that your calculation has generated an indefinite, undefined, undetermined value. NANs aren't equal to anything. The fact that a NAN happens to have an existence as a bit-pattern at some location, or as a distinct object, is an implementation detail that is irrelevant. If you just happen by some fluke to compare a NAN to itself, that shouldn't change the result of the comparison: The present King of France is the current male sovereign who rules France is still false, even if you happen to write it like this: The present King of France is the present King of France This might seem surprising to those who are used to reflexivity. Oh well. Just because reflexivity holds for actual things, doesn't mean it holds for, er, things that aren't things. NANs are things that aren't things. Although both NaN == NaN and NaN != NaN could arguably be a maybe value, the all important reflexivity (x == x is True) is effectively part of the language. All collections rely on it and Python wouldn't be much use without dicts, tuples and lists. Perhaps they shouldn't rely on it. Identity tests are an implementation detail. But in any case, reflexivity is *not* a guarantee of Python. With rich comparisons, you can define __eq__ to do anything you like. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 10:37 AM, Barry Warsaw ba...@python.org wrote: I would agree. Use asserts for this can't possibly happen wink conditions. Maybe we should rename assert to wink, just to be clear on the usage. :-) -Fred -- Fred L. Drake, Jr. fdrake at acm.org Give me the luxuries of life and I will willingly do without the necessities. --Frank Lloyd Wright ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
Barry I would agree. Use asserts for this can't possibly happen Barry wink conditions. Without looking, I suspect that's probably what the author thought he was doing. Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 10:22 AM, s...@pobox.com wrote: Barry I would agree. Use asserts for this can't possibly happen Barry wink conditions. Without looking, I suspect that's probably what the author thought he was doing. BTW, I think it always helps to have a really good assert message, and/or a leading comment to explain *why* that condition can't possibly happen. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 11:04 AM, Fred Drake wrote: On Thu, Apr 28, 2011 at 10:37 AM, Barry Warsaw ba...@python.org wrote: I would agree. Use asserts for this can't possibly happen wink conditions. Maybe we should rename assert to wink, just to be clear on the usage. :-) Off to python-ideas for you! wink -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes issue10761: tarfile.extractall failure when symlinked files are
On Thu, 28 Apr 2011 22:44:50 +0800 Senthil Kumaran orsent...@gmail.com wrote: On Thu, Apr 28, 2011 at 04:20:06PM +0200, Éric Araujo wrote: if hasattr(os, symlink) and hasattr(os, link): # For systems that support symbolic and hard links. if tarinfo.issym(): +if os.path.exists(targetpath): +os.unlink(targetpath) Is there a race condition here? The lock to avoid race conditions (if you were thinking along those lines) would usually be implemented at the higher level code which is using extractall in threads. A lock would only protect only against multi-threaded use of the tarfile module, which is probably quite rare and therefore not a real concern. The kind of race condition which can happen here is if an attacker creates targetpath between os.path.exists and os.unlink. Whether it is an exploitable flaw would need a detailed analysis, of course. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython (2.7): Fix closes issue10761: tarfile.extractall failure when symlinked files are
On Thu, Apr 28, 2011 at 4:44 PM, Senthil Kumaran orsent...@gmail.com wrote: On Thu, Apr 28, 2011 at 04:20:06PM +0200, Éric Araujo wrote: if hasattr(os, symlink) and hasattr(os, link): # For systems that support symbolic and hard links. if tarinfo.issym(): +if os.path.exists(targetpath): +os.unlink(targetpath) Is there a race condition here? The lock to avoid race conditions (if you were thinking along those lines) would usually be implemented at the higher level code which is using extractall in threads. Checking that no one else is accessing the file before unlinking may not be suitable for the library method and of course, we cannot check if someone is waiting to act on that file. I think Éric is referring to the possibility of another process creating or deleting targetpath between the calls to os.path.exists() and os.unlink(). This would result in symlink() or unlink() raising an exception. The deletion case could be handled like this: if tarinfo.issym(): +try: +os.unlink(targetpath) +except OSError as e: +if e.errno != errno.ENOENT: +raise os.symlink(tarinfo.linkname, targetpath) I'm not sure what the best way of handling the creation case is. The obvious solution would be to try the above code in a loop, repeating until we succeed (or fail for a different reason), but this would not be guaranteed to terminate. Cheers, Nadeem ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 5:26 PM, Barry Warsaw ba...@python.org wrote: On Apr 28, 2011, at 10:22 AM, s...@pobox.com wrote: Barry I would agree. Use asserts for this can't possibly happen Barry wink conditions. Without looking, I suspect that's probably what the author thought he was doing. BTW, I think it always helps to have a really good assert message, and/or a leading comment to explain *why* that condition can't possibly happen. why bother, it can't happen ;) -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com -- Tarek Ziadé | http://ziade.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/27/11 11:54 PM, Guido van Rossum wrote: On Wed, Apr 27, 2011 at 9:25 PM, Robert Kernrobert.k...@gmail.com wrote: On 2011-04-27 23:01 , Guido van Rossum wrote: And I wouldn't want to change that. It sounds like NumPy wouldn't be much affected if we were to change this (which I'm not saying we would). Well, I didn't say that. If Python changed its behavior for (float('nan') == float('nan')), we'd have to seriously consider some changes. Ah, but I'm not proposing anything of the sort! float('nan') returns a new object each time and two NaNs that are not the same *object* will still follow the IEEE std. It's just when comparing a NaN-valued *object* to *itself* (i.e. the *same* object) that I would consider following the lead of Python's collections. Ah, I see! We do like to keep *some* amount of correspondence with Python semantics. In particular, we like our scalar types that match Python types to work as close to the Python type as possible. We have the np.float64 type, which represents a C double scalar and corresponds to a Python float. It is used when a single item is indexed out of a float64 array. We even subclass from the Python float type to help working with libraries that may not know about numpy: [~] |5 import numpy as np [~] |6 nan = np.array([1.0, 2.0, float('nan')])[2] [~] |7 nan == nan False Yeah, this is where things might change, because it is the same *object* left and right. [~] |8 type(nan) numpy.float64 [~] |9 type(nan).mro() [numpy.float64, numpy.floating, numpy.inexact, numpy.number, numpy.generic, float, object] If the Python float type changes behavior, we'd have to consider whether to keep that for np.float64 or change it to match the usual C semantics used elsewhere. So there *would* be a dilemma. Not necessarily the most nerve-wracking one, but a dilemma nonetheless. Given what I just said, would it still be a dilemma? Maybe a smaller one? Smaller, certainly. But now it's a trilemma. :-) 1. Have just np.float64 and np.complex128 scalars follow the Python float semantics since they subclass Python float and complex, respectively. 2. Have all np.float* and np.complex* scalars follow the Python float semantics. 3. Keep the current IEEE-754 semantics for all float scalar types. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/11 12:37 AM, Alexander Belopolsky wrote: On Thu, Apr 28, 2011 at 12:33 AM, Robert Kernrobert.k...@gmail.com wrote: On 2011-04-27 23:24 , Guido van Rossum wrote: .. So do new masks get created when the outcome of an elementwise operation is a NaN? No. Yes. from MA import array print array([0])/array([0]) [-- ] (I don't have numpy on this laptop, so the example is using Numeric, but I hope you guys did not change that while I was not looking:-) This behavior is not what you think it is. Rather, some binary operations have been augmented with a domain of validity, and the results will be masked out when the domain is violated. Division is one of them, and division by zero will cause the result to be masked. You can produce NaNs in other ways that will not be masked in both numpy and old Numeric: [~] |4 minf = np.ma.array([1e300]) * np.ma.array([1e300]) Warning: overflow encountered in multiply [~] |5 minf masked_array(data = [ inf], mask = False, fill_value = 1e+20) [~] |6 minf - minf masked_array(data = [ nan], mask = False, fill_value = 1e+20) [~] |14 import MA [~] |15 minf = MA.array([1e300]) * MA.array([1e300]) [~] |16 minf array([ inf,]) [~] |17 (minf - minf)[0] nan [~] |25 (minf - minf)._mask is None True Numeric has a bug where it cannot print arrays with NaNs, so I just grabbed the element out instead of showing it. But I guarantee you that it is not masked. Masked arrays are not a way to avoid NaNs arising from computations. NaN handling is an important part of computing with numpy. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Steven D'Aprano wrote: Mark Shannon wrote: Related to the discussion on Not a Number can I point out a few things that have not be explicitly addressed so far. The IEEE standard is about hardware and bit patterns, rather than types and values so may not be entirely appropriate for high-level language like Python. I would argue that the implementation of NANs is irrelevant. If NANs are useful in hardware floats -- and I think they are -- then they're just as equally useful as objects, or as strings in languages like REXX or Hypertalk where all data is stored as strings, or as quantum wave functions in some future quantum computer. So, Indeed, so its OK if type(NaN) != type(0.0) ? NaN is *not* a number (the clue is in the name). Python treats it as if it were a number: import numbers isinstance(nan, numbers.Number) True Can be read as 'Not a Number' is a Number ;) I see your wink, but what do you make of these? class NotAnObject(object): pass nao = NotAnObject() assert isinstance(nao, object) Trying to make something not an object in a language where everything is an object is bound to be problematic. class NotAType(object): pass assert type(NotAType) is type NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Others have already pointed out this won't make any difference. Fundamentally, the problem is that some containers bypass equality tests for identity tests. There may be good reasons for that shortcut, but it leads to problems with *any* object that does not define equality to be reflexive, not just NANs. class Null: ... def __eq__(self, other): ... return False ... null = Null() null == null False [null] == [null] True Just because you can do that, doesn't mean you should. Equality should be reflexive, without that fundamental assumption many non-numeric algorithms fall apart. The default comparisons will then work as expected for collections. (No doubt, making NaN a new class will cause a whole new set of problems) As pointed out by Meyer: NaN == NaN is False is no more logical than NaN != NaN is False I don't agree with this argument. I think Meyer is completely mistaken there. The question of NAN equality is that of a vacuous truth, quite similar to the Present King of France: http://en.wikipedia.org/wiki/Present_King_of_France Meyer would have us accept that: The present King of France is a talking horse and The present King of France is not a talking horse are equally (pun not intended) valid. No, no they're not. I don't know much about who the King of France would be if France had a king, but I do know that he wouldn't be a talking horse. Once you accept that NANs aren't equal to anything else, it becomes a matter of *practicality beats purity* to accept that they can't be equal Not breaking a whole bunch of collections and algorithms has a certain practical appeal as well ;) to themselves either. A NAN doesn't represent a specific thing. It's a signal that your calculation has generated an indefinite, undefined, undetermined value. NANs aren't equal to anything. The fact that a NAN happens to have an existence as a bit-pattern at some location, or as a distinct object, is an implementation detail that is irrelevant. If you just happen by some fluke to compare a NAN to itself, that shouldn't change the result of the comparison: The present King of France is the current male sovereign who rules France is still false, even if you happen to write it like this: The present King of France is the present King of France The problem with this argument is the present King of France does not exist, whereas NaN (as a Python object) does exist. The present King of France argument only applies to non-existent things. Python objects do exist (as much as any computer language entity exists). So the expression The present King of France either raises an exception (non-existence) or evaluates to an object (existence). In this case the present King of France doesn't exist and should raise a FifthRepublicException :) inf / inf does not raise an exception, but evaluates to NaN, so NaN exists. For objects (that exist): (x is x) is True. The present President of France is the present President of France, regardless of who he or she may be. This might seem surprising to those who are used to reflexivity. Oh well. Just because reflexivity holds for actual things, doesn't mean it holds for, er, things that aren't things. NANs are things that aren't things. A NaN is thing that *is* a thing; it exists: object.__repr__(float('nan')) Of course if inf - inf, inf/inf raised exceptions, then NaN wouldn't exist (as a Python object) and the problem would just go away :) After all 0.0/0.0 already raises an exception, but the IEEE defines 0.0/0.0 as NaN. Although both NaN == NaN and NaN != NaN could arguably be a maybe
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
On 28/04/2011 15:58, Steven D'Aprano wrote: Fundamentally, the problem is that some containers bypass equality tests for identity tests. There may be good reasons for that shortcut, but it leads to problems with *any* object that does not define equality to be reflexive, not just NANs. I say you have that backwards. It is a legitimate shortcut, and any object that (perversely) doesn't define equality to be reflexive leads (unsurprisingly) to problems with it (and with *anything else* that - very reasonably - assumes that identity implies equality). Mark Shannon wrote: Although both NaN == NaN and NaN != NaN could arguably be a maybe value, the all important reflexivity (x == x is True) is effectively part of the language. All collections rely on it and Python wouldn't be much use without dicts, tuples and lists. Perhaps they shouldn't rely on it. Identity tests are an implementation detail. But in any case, reflexivity is *not* a guarantee of Python. With rich comparisons, you can define __eq__ to do anything you like. And you can write True = False (at least in older versions of Python you could). No language stops you from writing stupid programs. In fact I would propose that the language should DEFINE the meaning of == to be True if its operands are identical, and only if they are not would it use the comparison operators, thus enforcing reflexivity. (Nothing stops you from writing your own non-reflexive __eq__ and calling it explicitly, and I think it is right that you should have to work harder and be more explicit if you want that behaviour.) Please, please, can we have a bit of common sense and perspective here. No-one (not even a mathematician) except someone from Wonderland would seriously want an object not equal to itself. Regards Rob Cliffe ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Mark Shannon wrote: Steven D'Aprano wrote: Mark Shannon wrote: Related to the discussion on Not a Number can I point out a few things that have not be explicitly addressed so far. The IEEE standard is about hardware and bit patterns, rather than types and values so may not be entirely appropriate for high-level language like Python. I would argue that the implementation of NANs is irrelevant. If NANs are useful in hardware floats -- and I think they are -- then they're just as equally useful as objects, or as strings in languages like REXX or Hypertalk where all data is stored as strings, or as quantum wave functions in some future quantum computer. So, Indeed, so its OK if type(NaN) != type(0.0) ? Sure. But that just adds complexity without actually resolving anything. Fundamentally, the problem is that some containers bypass equality tests for identity tests. There may be good reasons for that shortcut, but it leads to problems with *any* object that does not define equality to be reflexive, not just NANs. [...] Just because you can do that, doesn't mean you should. Equality should be reflexive, without that fundamental assumption many non-numeric algorithms fall apart. So what? If I have a need for non-reflexivity in my application, why should I care that some other algorithm, which I'm not using, will fail? Python supports non-reflexivity. If I take advantage of that feature, I can't guarantee that *other objects* will be smart enough to understand this. This is no different from any other property of my objects. The default comparisons will then work as expected for collections. (No doubt, making NaN a new class will cause a whole new set of problems) As pointed out by Meyer: NaN == NaN is False is no more logical than NaN != NaN is False I don't agree with this argument. I think Meyer is completely mistaken there. The question of NAN equality is that of a vacuous truth, quite similar to the Present King of France: http://en.wikipedia.org/wiki/Present_King_of_France [...] The problem with this argument is the present King of France does not exist, whereas NaN (as a Python object) does exist. NANs (as Python objects) exist in the same way as the present King of France exists as words. It's an implementation detail: we can't talk about the non-existent present King of France without using words, and we can't do calculations on non-existent/indeterminate values in Python without objects. Words can represent things that don't exist, and so can bit-patterns or objects or any other symbol. We must be careful to avoid mistaking the symbol (the NAN bit-pattern or object) for the thing (the result of whatever calculation generated that NAN). The idea of equality we care about is equality of what the symbol represents, not the symbol itself. The meaning of spam and eggs should not differ according to the typeface we write the words in. Likewise the number 42 should not differ according to how the int object is laid out, or whether the bit-pattern is little-endian or big-endian. What matters is the thing itself, 42, not the symbol: it will still be 42 even if we decided to write it in Roman numerals or base 13. Likewise, what matters is the non-thingness of NANs, not the fact that the symbol for them has an existence as an object or a bit-pattern. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
[This is a mega-reply, combining responses to several messages in this thread. I may be repeating myself a bit, but I think I am being consistent. :-)] On Wed, Apr 27, 2011 at 10:12 PM, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Apr 28, 2011 at 2:54 PM, Guido van Rossum gu...@python.org wrote: Well, I didn't say that. If Python changed its behavior for (float('nan') == float('nan')), we'd have to seriously consider some changes. Ah, but I'm not proposing anything of the sort! float('nan') returns a new object each time and two NaNs that are not the same *object* will still follow the IEEE std. It's just when comparing a NaN-valued *object* to *itself* (i.e. the *same* object) that I would consider following the lead of Python's collections. The reason this possibility bothers me is that it doesn't mesh well with the implementations are free to cache and reuse immutable objects rule. Although, if the updated NaN semantics were explicit that identity was now considered part of the value of NaN objects (thus ruling out caching them at the implementation layer), I guess that objection would go away. The rules for float could be expanded to disallow NaN caching. But even if we didn't change any rules, reusing immutable objects could currently make computations undefined, because container comparisons use the identity wins rule. E.g. if we didn't change the rule for nan==nan, but we did change float(nan) to always return a specific singleton, comparisons like [float(nan)] == [float(nan)] would change in outcome. (Note that not all NaNs could be the same object, since there are multiple bit patterns meaning NaN; IIUC this is different from Inf.) All this makes me realize that there would be another issue, one that I wouldn't know how to deal with: a JITting interpreter could translate code involving floats into machine code, at which point object identity would be lost (presumably the machine code would use IEEE value semantics for NaN). This also reminds me that the current identity wins rules for containers, combined with the NaN==NaN is always False for non-container contexts, theoretically also might pose constraints on the correctness of certain JIT optimizations. I don't know if PyPy optimizes any code involving tuples or lists of floats, so I don't know if it is a problem in practice, but it does seem to pose a complex constraint in theory. TBH Whatever Raymond may say, I have never been a fan of the identity wins rules for containers given that we don't have a corresponding rule requiring __eq__ to return True for x.__eq__(x). On Wed, Apr 27, 2011 at 10:27 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: Note that ctypes' floats already behave this way: x = c_double(float('nan')) x == x True But ctypes floats are not numbers. I don't think this provides any evidence (except of possibly a shortcut in the ctypes implementation for == :-). Before we go down this path, I would like to discuss another peculiarity of NaNs: float('nan') 0 False float('nan') 0 False This property in my experience causes much more trouble than nan == nan being false. The problem is that common sorting or binary search algorithms may degenerate into infinite loops in the presence of nans. This may even happen when searching for a finite value in a large array that contains a single nan. Errors like this do happen in the wild and and after chasing a bug like this programmers tend to avoid nans at all costs. Oftentimes this leads to using magic placeholders such as 1e300 for missing data. Since py3k has already made None 0 an error, it may be reasonable for float('nan') 0 to raise an error as well (probably ValueError rather than TypeError). This will not make lists with nans sortable or searchable using binary search, but will make associated bugs easier to find. Hmm... It feels like a much bigger can of worms and I'm not at all sure that it is going to work out any better than the current behavior (which can be coarsely characterized as tough shit, float + {NaN} do not form a total ordering :-). Remember when some string comparisons would raise exceptions if uncomparable Unicode and non-Unicode values were involved? That was a major pain and we gladly killed that in Py3k. (Though it was for ==/!=, not for etc.) Basically I think the IEEE std has probably done a decent job of defining how NaNs should behave, with the exception of object identity -- because the IEEE std does not deal with objects, only with values. The only other thing that could perhaps work would be to disallow NaN from ever being created, instead always raising an exception if NaN would be produced. Like we do with division by zero. But that would be a *huge* incompatible change to Python's floating point capabilities and I'm not interested in going there. The *only* point where I think we might have a real problem is the discrepancy between individual NaN comparisons and
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. In regular code, assert should be about detecting buggy code. It should not be used to test for error conditions in input data. (Both these can be summarized as if you still want the test to happen with -O, don't use assert.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Alexander Belopolsky wrote: With the status quo in Python, it may only make sense to store NaNs in array.array, but not in a list. That's a bit extreme. It only gets you into trouble if you reason like this: a = b = [1, 2, 3, float('nan')] if a == b: ... for x,y in zip(a,b): ... assert x==y ... Traceback (most recent call last): File stdin, line 3, in module AssertionError But it's perfectly fine to do this: sum(a) nan exactly as expected. Prohibiting NANs from lists is massive overkill for a small (alleged) problem. I know thousands of words have been spilled on this, including many by myself, but I really believe this discussion is mostly bike-shedding. Given the vehemence of some replies, and the volume of talk, anyone would think that you could hardly write a line of Python code without badly tripping over problems caused by NANs. The truth is, I think, that most people will never see one in real world code, and those who are least likely to come across them are the most likely to be categorically against them. (I grant that Alexander is an exception -- I understand that he does do a lot of numeric work, and does come across NANs, and still doesn't like them one bit.) -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Rob Cliffe wrote: To me the idea of non-reflexive equality (an object not being equal to itself) is abhorrent. Nothing is more likely to put off new Python users if they happen to run into it. I believe that's a gross exaggeration. In any case, that's just your opinion, and Python is hardly the only language that supports (at least partially) NANs. Besides, floats have all sorts of unintuitive properties that go against properties of real numbers, and new users manage to cope. With floats, even ignoring NANs, you can't assume: a*(b+c) == a*b + a*c a+b+c = c+b+a 1.0/x*x = 1 x+y-x = y x+1 x or many other properties of real numbers. In real code, the lack of reflexivity for NANs is just not that important. You can program for *years* without once accidentally stumbling over one, whereas you can't do the simplest floating point calculation without stubbing your toes on things like this: 1.0/10 0.10001 Search the archives of the python-l...@python.org mailing list. You will find regular questions from newbies similar to Why doesn't Python calculate 1/10 correctly, is it broken? (Except that most of the time they don't *ask* if it's broken, they just declare that it is.) Compared to that, which is concrete and obvious and frequent, NANs are usually rare and mild. The fact is, NANs are useful. Less useful in Python, which goes out of the way to avoid generating them (a pity, in my opinion), but still useful. Basically it's deferring to a wart, of dubious value, in floating point calculations and/or the IEEE754 standard, and allowing it to become a monstrous carbuncle disfiguring the whole language. A ridiculous over-reaction. How long have you been programming in Python? Months? Years? If the language was disfigured by a monstrous carbuncle, you haven't noticed until now. I think implementations of equal/not-equal which are make equality non-reflexive (and thus break identity implies equality) should be considered broken. Then Python is broken by design, because by design *all* rich comparison methods can do anything. On 27/04/2011 15:53, Guido van Rossum wrote: Maybe we should just call off the odd NaN comparison behavior? Right on, Guido. (A pity that a lot of people don't seem to be listening.) Oh we're listening. Some of us are just *disagreeing*. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Guido van Rossum wrote: *If* my proposal gets accepted, there will be a blanket rule that no matter how exotic an type's __eq__ is defined, self.__eq__(self) (i.e., __eq__ called with the same *object* argument) must return True if the type's __eq__ is to be considered well-behaved; and Python containers may assume (for the purpose of optimizing their own comparison operations) that their elements have a well-behaved __eq__. I think that so long as badly defined objects are explicitly still permitted (with the understanding that they may behave badly in containers), and so long as NANs continue to be badly behaved in this sense, then I could live with that. It's really just formalizing the status quo as deliberate policy rather than an accident: nan == nan will still return False [nan] == [nan] will still return True. Purists on both sides will hate it :) -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Steven D'Aprano wrote: I know thousands of words have been spilled on this, including many by myself, but I really believe this discussion is mostly bike-shedding. Hmmm... on reflection, I think I may have been a bit unfair. In particular, I don't mean any slight on any of the people who have made intelligent, insightful posts, even if I disagree with them. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 12:59 PM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. In regular code, assert should be about detecting buggy code. It should not be used to test for error conditions in input data. (Both these can be summarized as if you still want the test to happen with -O, don't use assert.) You're both right! :) My take on assert is don't use it, ever. assert is supposed to be about conditions that never happen. So there are a few cases where I might use it: If I use it to enforce a precondition, it's wrong because under -OO my preconditions won't be checked and my input might be invalid. If I use it to enforce a postcondition, then my API's consumers have to occasionally handle this weird error, except it won't be checked under -OO so they won't be able to handle it consistently. If I use it to try to make assertions about internal state during a computation, then I introduce an additional, untested (at the very least untested under -OO), probably undocumented (did I remember to say and raises AssertionError when... in its docstring?) code path where when this bad thing happens, I get an exception instead of a result. If that's an important failure mode, then there ought to be a documented exception, which the computation's consumers can deal with. If it really should never happen, then I really should have just written some unit tests verifying that it doesn't happen in any case I can think of. And I shouldn't be writing code to handle cases I can't come up with any way to exercise, because how do I know that it's going to do the right thing? (If I had a dollar for every 'assert' message that didn't have the right number of arguments to its format string, etc.) Also, when things that should never happen do actually happen in real life, is a random exception that interrupts the process actually an improvement over just continuing on with some potentially bad data? In most cases, no, it really isn't, because by blowing up you've removed the ability of the user to take corrective action or do a workaround. (In the cases where blowing up is better because you're about to do something destructive, again, a test seems in order.) My python code is very well documented, which means that there is sometimes a significant runtime overhead from docstrings. That's really my only interest in -OO: reducing memory footprint of Python processes by dropping dozens of megabytes of library documentation from each process. The fact that it changes the semantics of 'assert' is an unfortunate distraction. So the only time I'd even consider using 'assert' is in a throwaway script which might be run once, that I'm not going to write any tests for and I'm not going to maintain, but I might care about just enough to want to blow up instead of calling 'os.unlink' if certain conditions are not met. (But then every time I actually use it that way, I realize that I should have dealt with the error sanely and I probably have to go back and fix it anyway.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 1:25 PM, Steven D'Aprano st...@pearwood.info wrote: .. But it's perfectly fine to do this: sum(a) nan This use case reminded me Kahan's Were there no way to get rid of NaNs, they would be as useless as Indefinites on CRAYs; as soon as one were encountered, computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion. http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps More often than not, you would want to sum non-NaN values instead. .. (I grant that Alexander is an exception -- I understand that he does do a lot of numeric work, and does come across NANs, and still doesn't like them one bit.) I like NaNs for high-performance calculations, but once you wrap floats individually in Python objects, performance is killed and you are better off using None instead of NaN. Python lists don't support element-wise operations and therefore there is little gain from being able to write x + y in loops over list elements instead of ieee_add(x, y) or add_or_none(x, y) with proper definitions of these functions. On the other hand, __eq__ gets invoked implicitly in cases where you don't access to the loop. Your only choice is to filter your data before invoking such operations. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 28/04/2011 18:26, Steven D'Aprano wrote: Rob Cliffe wrote: To me the idea of non-reflexive equality (an object not being equal to itself) is abhorrent. Nothing is more likely to put off new Python users if they happen to run into it. I believe that's a gross exaggeration. In any case, that's just your opinion, and Python is hardly the only language that supports (at least partially) NANs. Besides, floats have all sorts of unintuitive properties that go against properties of real numbers, and new users manage to cope. With floats, even ignoring NANs, you can't assume: a*(b+c) == a*b + a*c a+b+c = c+b+a 1.0/x*x = 1 x+y-x = y x+1 x or many other properties of real numbers. In real code, the lack of reflexivity for NANs is just not that important. You can program for *years* without once accidentally stumbling over one, whereas you can't do the simplest floating point calculation without stubbing your toes on things like this: 1.0/10 0.10001 Of course, these are inevitable consequences of floating-point representation. Inevitable in just about *any* language. The fact is, NANs are useful. Less useful in Python, which goes out of the way to avoid generating them (a pity, in my opinion), but still useful. I am not arguing against the use of NANs. Or even against different NANs not being equal to each other. What I was arguing about was the behaviour of Python objects that represent NANs, specifically in allowing x == x to be False, something which is *not* inevitable but a choice of language design or usage. Rob Cliffe ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Proposal for a common benchmark suite
Hello, As announced in my GSoC proposal I'd like to announce which benchmarks I'll use for the benchmark suite I will work on this summer. As of now there are two benchmark suites (that I know of) which receive some sort of attention, those are the ones developed as part of the PyPy project[1] which is used for http://speed.pypy.org and the one initially developed for Unladen Swallow which has been continued by CPython[2]. The PyPy benchmarks contain a lot of interesting benchmarks some explicitly developed for that suite, the CPython benchmarks have an extensive set of microbenchmarks in the pybench package as well as the previously mentioned modifications made to the Unladen Swallow benchmarks. I'd like to simply merge both suites so that no changes are lost. However I'd like to leave out the waf benchmark which is part of the PyPy suite, the removal was proposed on pypy-dev for obvious deficits[3]. It will be easier to add a better benchmark later than replacing it at a later point. Unless there is a major issue with this plan I'd like to go forward with this. .. [1]: https://bitbucket.org/pypy/benchmarks .. [2]: http://hg.python.org/benchmarks .. [3]: http://mailrepository.com/pypy-dev.codespeak.net/msg/3627509/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Simple XML-RPC server over SSL/TLS
Éric Araujo mer...@netwok.org wrote: Hi, But what I would like to know, is if is there any reason why XML-RPC can't optionally work over TLS/SSL using Python's ssl module. I'll create a ticket, and send a patch, but I was wondering if it was a reason why this was not implemented. I think there’s no deeper reason than nobody thought about it. The ssl module is new in 2.6 and 3.x, xmlrpc is an older module for an old technology *cough*, so feel free to open a bug report. Patch guidelines are found at http://docs.python.org/devguide Thanks in advance! What he said. I'm not a big fan of XMLRPC in the first place, so I probably didn't even notice that there wasn't SSL support for it. Go for it! Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/2011 6:11 AM, Nick Coghlan wrote: On Thu, Apr 28, 2011 at 6:30 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Thu, Apr 28, 2011 at 3:57 AM, Nick Coghlanncogh...@gmail.com wrote: .. It is an interesting question of what sane invariants are. Why you consider the invariants that you listed essential while say if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true *regardless* of IEEE754 by checking identity explicitly. AFAIK, IEEE754 says nothing about comparison of containers, so my invariant cannot violate it. What you probably wanted to say is that my invariant cannot be achieved in the presence of IEEE754 conforming floats, but this observation by itself does not make my invariant less important than yours. It just makes yours easier to maintain. No, I meant what I said. Your assertion includes a direct comparison between values (the x == y part) which means that IEEE754 has a bearing on whether or not it is a valid assertion. Every single one of my stated invariants consists solely of relationships between containers, or between a container and its contents. This keeps them all out of the domain of IEEE754 since the *container implementers* get to decide whether or not to factor object identity into the management of the container contents. The core containment invariant is really only this one: for x in c: assert x in c That is, if we iterate over a container, all entries returned should be in the container. Hopefully it is non-controversial that this is a sane and reasonable invariant for a container *user* to expect. The comparison invariants follow from the definition of set equivalence as: set1 == set2 iff all(x in set2 for x in set1) and all(y in set1 for y in set2) Again, notice that there is no comparison of items here - merely a consideration of the way items relate to containers. I agree that the container (author) gets to define container equality. The definition should also be correctly documented. 5.9. Comparisons says Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.. This, I believe is the same as what Hrvoje said I would expect l1 == l2, where l1 and l2 are both lists, to be semantically equivalent to len(l1) == len(l2) and all(imap(operator.eq, l1, l2)). But Currently it isn't, and that was the motivation for this thread. In this case, I think the discrepancy should be fixed by changing the doc. Add 'be identical or ' before 'compare equal'. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for a common benchmark suite
DasIch, 28.04.2011 20:55: the CPython benchmarks have an extensive set of microbenchmarks in the pybench package Try not to care too much about pybench. There is some value in it, but some of its microbenchmarks are also tied to CPython's interpreter behaviour. For example, the benchmarks for literals can easily be considered dead code by other Python implementations so that they may end up optimising the benchmarked code away completely, or at least partially. That makes a comparison of the results somewhat pointless. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Identity implies equality
ISTM there is no right or wrong answer. There is just a question of what is most useful. AFAICT, the code for dictionaries (and therefore the code for sets) has always had identity-implies-equality logic. It makes dicts blindingly fast for common cases. It also confers some nice properties like making it possible to retrieve a NaN that has been stored as a key; otherwise, you could store it but not look it up, pop it, or delete it (because the equality test would always fail). The logic also confers other nice-to-have properties such as: * d[k] = v; assert k in d # assignment-implies-contains * assert all(k in d for k in d) # all-members-are-members These aren't essential invariants but they do provide a pleasant programming environment and make it easier to reason about programs. Another place where identity-implies-equality logic is explicit is in Py_RichCompareBool(). That lets methods in many other functions and methods work like dicts and sets. It speeds them up and confers some nice-to-haves like: * mylist.append(obj) implies mylist.count(obj) 0 * x = obj implies x == obj # assignment really works There may be lots of other code that implicitly makes similar assumptions. I don't know how you could reliably find those and rip them out. If identity-implies-equality does get ripped out, I don't know what we would win. It would make it possible to do some cute NaN tricks, but I don't think you can defend against the general problem of funky objects being able to muck-up code that looks correct. You get oddities when an object lies about its length. You get oddities when an object has a hash that doesn't match its equality function. The situation with NaNs and sorts is a prime example: sorted([1.2, 3.4, float('Nan'), -1.2, float('Inf'), float('Nan')]) [1.2, 3.4, nan, -1.2, inf, nan] Personally, I think the status quo is fine and that practicality is beating purity. High quality programs are written every day. Numeric programmers seem to have no problem using NaNs as-is. AFAICT, the only actual problem in front us is the OP's post where he was able to surprise himself with some NaN experiments at the interactive prompt. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
On 4/28/2011 4:40 AM, Mark Shannon wrote: NaN is *not* a number (the clue is in the name). The problem is that the committee itself did not believe or stay consistent with that. In the text of the draft, they apparently refer to Nan as an indefinite, unspecified *number*. Sort of like a random variable with a uniform pseudo* distribution over the reals (* 0 everywhere with integral 1). Or a quantum particle present but smeared out over all space. And that apparently is their rationale for Nan != NaN: an unspecified number will equal another unspecified number with probability 0. The rationale for bool(NaN)==True is that an unspecified *number* will be 0 with probability 0. If Nan truly indicated an *absence* (like 0 and '') then bool(NaN) should be False, I think the committee goofed -- badly. Statisticians used missing value indicators long before the committee existed. They has no problem thinking that the indicator, as an object, equaled itself. So one could write (and I often did through the 1980s) the equivalent of for i,x in enumerate(datavec): if x == XMIS: # singleton missing value indicator for BMDP datavec[i] = default (Statistics packages have no concept of identity different from equality.) If statisticians had made XMIS != XMIS, that obvious code would not have worked, as it will not today for Python. Instead, the special case circumlocution of if isXMIS(x): would have been required, adding one more unnecessary function to the list of builtins. NaN is, in its domain, the equivalent of None (== Not a Value), which also serves an an alternative to immediately raising an exception. But like XMIS, None==None. Also, bool(None) is corretly for something that indicates absence. Python treats it as if it were a number: As I said, so did the committee, and that was its mistake that we are more or less stuck with. NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Like None As pointed out by Meyer: NaN == NaN is False is no more logical than NaN != NaN is False This is wrong if False/True are interpreted as probabilities 0 and 1. To summarise: NaN is required so that floating point operations on arrays and lists do not raise unwanted exceptions. Like None. NaN is Not a Number (therefore should be neither a float nor a Decimal). Making it a new class would solve some of the problems discussed, but would create new problems instead. Agreed, if we were starting fresh. Correct behaviour of collections is more important than IEEE conformance of NaN comparisons. Also agreed. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Identity implies equality
On Thu, Apr 28, 2011 at 9:51 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: Personally, I think the status quo is fine and that practicality is beating purity. +1 Raymond cheers lvh ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for a common benchmark suite
Stefan Behnel wrote: DasIch, 28.04.2011 20:55: the CPython benchmarks have an extensive set of microbenchmarks in the pybench package Try not to care too much about pybench. There is some value in it, but some of its microbenchmarks are also tied to CPython's interpreter behaviour. For example, the benchmarks for literals can easily be considered dead code by other Python implementations so that they may end up optimising the benchmarked code away completely, or at least partially. That makes a comparison of the results somewhat pointless. The point of the micro benchmarks in pybench is to be able to compare them one-by-one, not by looking at the sum of the tests. If one implementation optimizes away some parts, then the comparison will show this fact very clearly - and that's the whole point. Taking the sum of the micro benchmarks only has some meaning as very rough indicator of improvement. That's why I wrote pybench: to get a better, more details picture of what's happening, rather than trying to find some way of measuring average use. This average is very different depending on where you look: for some applications method calls may be very important, for others, arithmetic operations, and yet others may have more need for fast attribute lookup. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 28 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2011-06-20: EuroPython 2011, Florence, Italy 53 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 6:59 PM, Guido van Rossum gu...@python.org wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. FWIW this is only true for the unittest module/pkg policy for writing and organising tests. There are other popular test frameworks like nose and pytest which promote using plain asserts within writing unit tests and also allow to write tests in functions. And judging from my tutorials and others places many people appreciate the ease of using asserts as compared to learning tons of new methods. YMMV. Holger regular code, assert should be about detecting buggy code. It should not be used to test for error conditions in input data. (Both these can be summarized as if you still want the test to happen with -O, don't use assert.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/holger.krekel%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/2011 12:55 PM, Guido van Rossum wrote: *If* my proposal gets accepted, there will be a blanket rule that no matter how exotic an type's __eq__ is defined, self.__eq__(self) (i.e., __eq__ called with the same *object* argument) must return True if the type's __eq__ is to be considered well-behaved; This, to me, is a statement of the obvious ;-), but it should be stated in the docs. Do you also propose to make NaNs at least this well-behaved or leave them ill-behaved? and Python containers may assume (for the purpose of optimizing their own comparison operations) that their elements have a well-behaved __eq__. This almost states the status quo of the implementation, and the doc needs to be updated correspondingly. I do not think we should let object ill-behavior infect containers, so that they also become ill-behaved (not equal to themselves). -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On 4/28/2011 11:22 AM, s...@pobox.com wrote: Barry I would agree. Use asserts for this can't possibly happen Barry wink conditions. Without looking, I suspect that's probably what the author thought he was doing. You wish: to repeat the example from threading: def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, verbose=None): assert group is None, group argument must be None for now is something that can easily happen. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On 4/28/11 3:27 PM, Holger Krekel wrote: On Thu, Apr 28, 2011 at 6:59 PM, Guido van Rossumgu...@python.org wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadéziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. FWIW this is only true for the unittest module/pkg policy for writing and organising tests. There are other popular test frameworks like nose and pytest which promote using plain asserts within writing unit tests and also allow to write tests in functions. And judging from my tutorials and others places many people appreciate the ease of using asserts as compared to learning tons of new methods. YMMV. Holger regular code, assert should be about detecting buggy code. It should not be used to test for error conditions in input data. (Both these can be summarized as if you still want the test to happen with -O, don't use assert.) Regardless of whether those frameworks encourage it, it's still the wrong thing to do for the reason that Guido states. Some bugs only show up under -O, so you ought to be running your test suite under -O, too. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal for a common benchmark suite
M.-A. Lemburg, 28.04.2011 22:23: Stefan Behnel wrote: DasIch, 28.04.2011 20:55: the CPython benchmarks have an extensive set of microbenchmarks in the pybench package Try not to care too much about pybench. There is some value in it, but some of its microbenchmarks are also tied to CPython's interpreter behaviour. For example, the benchmarks for literals can easily be considered dead code by other Python implementations so that they may end up optimising the benchmarked code away completely, or at least partially. That makes a comparison of the results somewhat pointless. The point of the micro benchmarks in pybench is to be able to compare them one-by-one, not by looking at the sum of the tests. If one implementation optimizes away some parts, then the comparison will show this fact very clearly - and that's the whole point. Taking the sum of the micro benchmarks only has some meaning as very rough indicator of improvement. That's why I wrote pybench: to get a better, more details picture of what's happening, rather than trying to find some way of measuring average use. This average is very different depending on where you look: for some applications method calls may be very important, for others, arithmetic operations, and yet others may have more need for fast attribute lookup. I wasn't talking about averages or sums, and I also wasn't trying to put down pybench in general. As it stands, it makes sense as a benchmark for CPython. However, I'm arguing that a substantial part of it does not make sense as a benchmark for PyPy and others. With Cython, I couldn't get some of the literal arithmetic benchmarks to run at all. The runner script simply bails out with an error when the benchmarks accidentally run faster than the initial empty loop. I imagine that PyPy would eventually even drop the loop itself, thus leaving nothing to compare. Does that tell us that PyPy is faster than Cython for arithmetic? I don't think it does. When I see that a benchmark shows that one implementation runs in 100% less time than another, I simply go *shrug* and look for a better benchmark to compare the two. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 1:48 PM, Terry Reedy tjre...@udel.edu wrote: On 4/28/2011 12:55 PM, Guido van Rossum wrote: *If* my proposal gets accepted, there will be a blanket rule that no matter how exotic an type's __eq__ is defined, self.__eq__(self) (i.e., __eq__ called with the same *object* argument) must return True if the type's __eq__ is to be considered well-behaved; This, to me, is a statement of the obvious ;-), but it should be stated in the docs. Do you also propose to make NaNs at least this well-behaved or leave them ill-behaved? As I said, my proposal is to consider this a bug of the same severity as __hash__ and __eq__ disagreeing, and would require float and Decimal to be changed. The more conservative folks are in favor of not changing anything (except the abstract Sequence class), and solving things by documentation only. In that case the exotic current behavior of should not be considered a bug but merely unusual, and the behavior of collections (assuming an object is always equal to itself, never mind what its __eq__ says) documented as just that. There would not be any mention of well-behaved nor a judgment that NaN is not well-behaved. If my proposal is accepted, the definition of sequence comparison etc. would actually become simpler, since it should not have to mention the special-casing of object identity; instead it could mention the assumption of items being well-behaved. Again, the relationship between __eq__ and __hash__ would be the model here; and in fact a well-behaved type would have both properties (__eq__ returns true - same __hash__, object identity - __eq__ returns true). A type that is not well-behaved has a bug. I do not want to declare the behavior of NaN a bug. and Python containers may assume (for the purpose of optimizing their own comparison operations) that their elements have a well-behaved __eq__. This almost states the status quo of the implementation, and the doc needs to be updated correspondingly. I do not think we should let object ill-behavior infect containers, so that they also become ill-behaved (not equal to themselves). There are other kinds of bad behavior that will still affect containers. So we have no choice about containers containing ill-behaved objects being (potentially) ill-behaved. In some sense the primary issue at hand is whether x == x returns False indicates that x has a bug, or not. If it is a bug, the current float and Decimal types have that bug, and need to be fixed; and then the current behavior of containers is merely' an optimization which may fail if there is a buggy item. The alternative is that we continue to say that it is not a bug, merely exotic, and that containers should test for identity before equality, not just as an optimization, but as the very essence of their semantics. The third option would be to say that the optimization is wrong. But nobody wants that, as it would require a container's __eq__ method to always compare all items before returning True, even when comparing a containing to *itself*. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 1:27 PM, Holger Krekel wrote: On Thu, Apr 28, 2011 at 6:59 PM, Guido van Rossum gu...@python.org wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. FWIW this is only true for the unittest module/pkg policy for writing and organising tests. There are other popular test frameworks like nose and pytest which promote using plain asserts within writing unit tests and also allow to write tests in functions. And judging from my tutorials and others places many people appreciate the ease of using asserts as compared to learning tons of new methods. YMMV. I've also observed that people appreciate using asserts with nose.py and py.test. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Thu, Apr 28, 2011 at 2:53 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: On Apr 28, 2011, at 1:27 PM, Holger Krekel wrote: On Thu, Apr 28, 2011 at 6:59 PM, Guido van Rossum gu...@python.org wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. FWIW this is only true for the unittest module/pkg policy for writing and organising tests. There are other popular test frameworks like nose and pytest which promote using plain asserts within writing unit tests and also allow to write tests in functions. And judging from my tutorials and others places many people appreciate the ease of using asserts as compared to learning tons of new methods. YMMV. I've also observed that people appreciate using asserts with nose.py and py.test. They must not appreciate -O. :-) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/11 11:55 AM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 8:52 AM, Robert Kernrobert.k...@gmail.com wrote: Smaller, certainly. But now it's a trilemma. :-) 1. Have just np.float64 and np.complex128 scalars follow the Python float semantics since they subclass Python float and complex, respectively. 2. Have all np.float* and np.complex* scalars follow the Python float semantics. 3. Keep the current IEEE-754 semantics for all float scalar types. *If* my proposal gets accepted, there will be a blanket rule that no matter how exotic an type's __eq__ is defined, self.__eq__(self) (i.e., __eq__ called with the same *object* argument) must return True if the type's __eq__ is to be considered well-behaved; and Python containers may assume (for the purpose of optimizing their own comparison operations) that their elements have a well-behaved __eq__. *If* so, then we would then just have to decide between #2 and #3. With respect to this proposal, how does that interact with types that do not return bools for rich comparisons? For example, numpy arrays return bool arrays for comparisons. SQLAlchemy expressions return other SQLAlchemy expressions, etc. I realize that by being not well-behaved in this respect, we give up our rights to be proper elements in sortable, containment-checking containers. But in this and your followup message, you seem to be making a stronger statement that self.__eq__(self) not returning anything but True would be a bug in all contexts. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 3:07 PM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 2:53 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: On Apr 28, 2011, at 1:27 PM, Holger Krekel wrote: On Thu, Apr 28, 2011 at 6:59 PM, Guido van Rossum gu...@python.org wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. FWIW this is only true for the unittest module/pkg policy for writing and organising tests. There are other popular test frameworks like nose and pytest which promote using plain asserts within writing unit tests and also allow to write tests in functions. And judging from my tutorials and others places many people appreciate the ease of using asserts as compared to learning tons of new methods. YMMV. I've also observed that people appreciate using asserts with nose.py and py.test. They must not appreciate -O. :-) It might be nice if there were a pragma or module variable to selectively enable asserts for a given test module so that -O would turn-off asserts in the production code but leave them on in a test_suite. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Identity implies equality
On Fri, Apr 29, 2011 at 5:51 AM, Raymond Hettinger raymond.hettin...@gmail.com wrote: * x = obj implies x == obj # assignment really works While I agree with your point of view regarding the status quo as a useful, practical compromise, I need to call out that particular example: nan = float('nan') x = nan x == nan False x in locals().values() True Due to rich comparison and the freedom to implement non-reflexive definitions of equality, the assignment x = obj implies only that: - x is obj - x in locals().values() Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 3:22 PM, Robert Kern robert.k...@gmail.com wrote: On 4/28/11 11:55 AM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 8:52 AM, Robert Kernrobert.k...@gmail.com wrote: Smaller, certainly. But now it's a trilemma. :-) 1. Have just np.float64 and np.complex128 scalars follow the Python float semantics since they subclass Python float and complex, respectively. 2. Have all np.float* and np.complex* scalars follow the Python float semantics. 3. Keep the current IEEE-754 semantics for all float scalar types. *If* my proposal gets accepted, there will be a blanket rule that no matter how exotic an type's __eq__ is defined, self.__eq__(self) (i.e., __eq__ called with the same *object* argument) must return True if the type's __eq__ is to be considered well-behaved; and Python containers may assume (for the purpose of optimizing their own comparison operations) that their elements have a well-behaved __eq__. *If* so, then we would then just have to decide between #2 and #3. With respect to this proposal, how does that interact with types that do not return bools for rich comparisons? For example, numpy arrays return bool arrays for comparisons. SQLAlchemy expressions return other SQLAlchemy expressions, etc. I realize that by being not well-behaved in this respect, we give up our rights to be proper elements in sortable, containment-checking containers. But in this and your followup message, you seem to be making a stronger statement that self.__eq__(self) not returning anything but True would be a bug in all contexts. Sorry, we'll have to make an exception for those of course. This will somewhat complicate the interpretation of well-behaved, because those are *not* well-behaved as far as containers go (both dict key lookup and list membership are affected) but it is not a bug -- however it is a bug to put these in containers and then use container comparisons on the container. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Fri, Apr 29, 2011 at 8:55 AM, Guido van Rossum gu...@python.org wrote: Sorry, we'll have to make an exception for those of course. This will somewhat complicate the interpretation of well-behaved, because those are *not* well-behaved as far as containers go (both dict key lookup and list membership are affected) but it is not a bug -- however it is a bug to put these in containers and then use container comparisons on the container. That's a point in favour of the status quo, though - with the burden of enforcing reflexivity placed on the containers, types are free to make use of rich comparisons to return more than just simple True/False results. I hadn't really thought about it that way before this discussion - it is the identity checking behaviour of the builtin containers that lets us sensibly handle cases like sets of NumPy arrays. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
2011/4/28 Raymond Hettinger raymond.hettin...@gmail.com: It might be nice if there were a pragma or module variable to selectively enable asserts for a given test module so that -O would turn-off asserts in the production code but leave them on in a test_suite. from __future__ import perfect_code_so_no_asserts :) -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Fri, Apr 29, 2011 at 2:55 AM, Guido van Rossum gu...@python.org wrote: Raymond strongly believes that containers must be allowed to use the modified definition, I believe purely for performance reasons. (Without this rule, a list or tuple could not even cut short being compared to *itself*.) It seems you are in that camp too. I'm a fan of the status quo, but not just for performance reasons - there is quite a bit of set theory that breaks once you allow non-reflexive equality*, so it makes sense to me to make it official that containers should ignore any non-reflexivity they come across. *To all the mathematicians in the audience yelling at their screens that the very idea of non-reflexive equality is an oxymoron... yes, I know :P Cheers, Nick. P.S. It's hard to explain the slightly odd point of view that seeing standard arithmetic constructed from Peano's Axioms and set theory can give you on discussions like this. It's a seriously different (and strange) way of thinking about the basic arithmetic constructs we normally take for granted, though :) -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 4:04 PM, Nick Coghlan ncogh...@gmail.com wrote: On Fri, Apr 29, 2011 at 8:55 AM, Guido van Rossum gu...@python.org wrote: Sorry, we'll have to make an exception for those of course. This will somewhat complicate the interpretation of well-behaved, because those are *not* well-behaved as far as containers go (both dict key lookup and list membership are affected) but it is not a bug -- however it is a bug to put these in containers and then use container comparisons on the container. That's a point in favour of the status quo, though - with the burden of enforcing reflexivity placed on the containers, types are free to make use of rich comparisons to return more than just simple True/False results. Possibly. Though for types that *do* return True/False, NaN's behavior can still be infuriating. I hadn't really thought about it that way before this discussion - it is the identity checking behaviour of the builtin containers that lets us sensibly handle cases like sets of NumPy arrays. But do they? For non-empty arrays, __eq__ will always return something that is considered true, so any hash collisions will cause false positives. And look at this simple example: class C(list): ... def __eq__(self, other): ... if isinstance(other, C): ... return [x == y for x, y in zip(self, other)] ... a = C([1,2,3]) b = C([2,1,3]) a == b [False, False, True] x = [a, a] b in x True -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Fri, Apr 29, 2011 at 9:13 AM, Guido van Rossum gu...@python.org wrote: I hadn't really thought about it that way before this discussion - it is the identity checking behaviour of the builtin containers that lets us sensibly handle cases like sets of NumPy arrays. But do they? For non-empty arrays, __eq__ will always return something that is considered true, so any hash collisions will cause false positives. And look at this simple example: class C(list): ... def __eq__(self, other): ... if isinstance(other, C): ... return [x == y for x, y in zip(self, other)] ... a = C([1,2,3]) b = C([2,1,3]) a == b [False, False, True] x = [a, a] b in x True Hmm, true. And things like count() and index() would still be thoroughly broken for sequences. OK, so scratch that idea - there's simply no sane way to handle such objects without using an identity-based container that ignores equality definitions altogether. Pondering the NaN problem further, I think we can relatively easily argue that reflexive behaviour at the object level fits within the scope of IEEE754. 1. IEEE754 is a value-based system, with a finite number of distinct NaN payloads 2. Python is an object-based system. In addition to their payload, NaN objects are further distinguished by their identity (infinite in theory, in practice limited by available memory). 3. We can still technically be conformant with IEEE754 even if we say that a given NaN object is equivalent to itself, but not to other NaN objects with the same payload. Unfortunately, this still doesn't play well with serialisation, which assumes that the identity of float objects doesn't matter: import pickle nan = float('nan') x = [nan, nan] x[0] is x[1] True y = pickle.loads(pickle.dumps(x)) y [nan, nan] y[0] is y[1] False Contrast that with the handling of lists, where identity is known to be significant: x = [[]]*2 x[0] is x[1] True y = pickle.loads(pickle.dumps(x)) y [[], []] y[0] is y[1] True I'd say I've definitely come around to being +0 on the idea of making the float() and decimal.Decimal() __eq__ definitions reflexive, but doing so does have implications when it comes to the ability to accurately save and restore application state. It isn't as simple as just adding if self is other: return True to the respective __eq__ implementations. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 4:40 PM, Nick Coghlan ncogh...@gmail.com wrote: Pondering the NaN problem further, I think we can relatively easily argue that reflexive behaviour at the object level fits within the scope of IEEE754. Now we're talking. :-) 1. IEEE754 is a value-based system, with a finite number of distinct NaN payloads 2. Python is an object-based system. In addition to their payload, NaN objects are further distinguished by their identity (infinite in theory, in practice limited by available memory). 3. We can still technically be conformant with IEEE754 even if we say that a given NaN object is equivalent to itself, but not to other NaN objects with the same payload. Unfortunately, this still doesn't play well with serialisation, which assumes that the identity of float objects doesn't matter: import pickle nan = float('nan') x = [nan, nan] x[0] is x[1] True y = pickle.loads(pickle.dumps(x)) y [nan, nan] y[0] is y[1] False Contrast that with the handling of lists, where identity is known to be significant: x = [[]]*2 x[0] is x[1] True y = pickle.loads(pickle.dumps(x)) y [[], []] y[0] is y[1] True Probably wouldn't kill us if fixed pickle to take object identity into account for floats whose value is nan. (Fortunately for 3rd party types pickle always preserves identity.) I'd say I've definitely come around to being +0 on the idea of making the float() and decimal.Decimal() __eq__ definitions reflexive, but doing so does have implications when it comes to the ability to accurately save and restore application state. It isn't as simple as just adding if self is other: return True to the respective __eq__ implementations. But it seems pickle is *already* broken, so that can't really be an argument against the proposed change, right? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 7:47 PM, Guido van Rossum gu...@python.org wrote: On Thu, Apr 28, 2011 at 4:40 PM, Nick Coghlan ncogh...@gmail.com wrote: Pondering the NaN problem further, I think we can relatively easily argue that reflexive behaviour at the object level fits within the scope of IEEE754. Now we're talking. :-) Note that Kahan is very critical of Java's approach, but NaN objects' comparison is not on his list of Java warts: http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/2011 4:40 PM, Nick Coghlan wrote: Hmm, true. And things like count() and index() would still be thoroughly broken for sequences. OK, so scratch that idea - there's simply no sane way to handle such objects without using an identity-based container that ignores equality definitions altogether. And the problem with that is that not all values are interned, to share a single identity per value, correct? On the other hand, proliferation of float objects containing NaN works, thus so would proliferation of non-float objects of the same value... but works would have a different meaning when there could be multiple identities of 6,981,433 in the same set. But this does bring up an interesting enough point to cause me to rejoin the conversation: Would it be reasonable to implement 3 types of containers: 1) using __eq__ (would not use identity comparison optimization) 2) using is (the case you describe above) 3) the status quo: is or __eq__ The first two would require an explicit constructor call because the syntax would be retained for case 3 for backward compatibility. Heavy users of NaN and other similar values might find case 1 useful, although they would need to be careful with mappings and sets. Heavy users of NumPy and other similar structures might find case 2 useful. Offering the choice, and documenting the alternatives may make a lot more programmers choose the proper comparison operations, and less likely to overlook or pooh-pooh the issue with the thought that it won't happen to their program anyway... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On Thu, Apr 28, 2011 at 5:24 PM, Glenn Linderman v+pyt...@g.nevcal.com wrote: Would it be reasonable to implement 3 types of containers: That's something for python-ideas. Occasionally containers that use custom comparisons come in handy (e.g. case-insensitive dicts). -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number (was PyObject_RichCompareBool identity shortcut)
Terry Reedy wrote: I think the committee goofed -- badly. Statisticians used missing value indicators long before the committee existed. They has no problem thinking that the indicator, as an object, equaled itself. So one could write (and I often did through the 1980s) the equivalent of for i,x in enumerate(datavec): if x == XMIS: # singleton missing value indicator for BMDP datavec[i] = default But NANs aren't missing values (although some people use them as such, that can be considered abuse of the concept). R distinguishes NANs from missing values: they have a built-in value NaN, and a separate built-in value NA which is the canonical missing value. R also treats comparisons of both special values as a missing value: NA == NA [1] NA NaN == NaN [1] NA including reflexivity: x = NA x == x [1] NA which strikes me as the worst of both worlds, guaranteed to annoy those who want the IEEE behaviour where NANs compare unequal, those like Terry who expect missing values to compare equal to other missing values, and those who want reflexivity to be treated as an invariant no matter what. NaN is Not a Number (therefore should be neither a float nor a Decimal). Making it a new class would solve some of the problems discussed, but would create new problems instead. Agreed, if we were starting fresh. I don't see that making NANs a separate class would make any practical difference what-so-ever, but the point is moot since we're not starting fresh :) Correct behaviour of collections is more important than IEEE conformance of NaN comparisons. Also agreed. To be pedantic, the IEEE standard doesn't have anything to say about comparisons of lists of floats that might contain NANs. Given the current *documented* behaviour that list equality is based on object equality, the actual behaviour is surprising, but I don't think there is anything wrong with the idea of containers assuming that their elements are reflexive. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
On 4/28/11 6:13 PM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 4:04 PM, Nick Coghlanncogh...@gmail.com wrote: On Fri, Apr 29, 2011 at 8:55 AM, Guido van Rossumgu...@python.org wrote: Sorry, we'll have to make an exception for those of course. This will somewhat complicate the interpretation of well-behaved, because those are *not* well-behaved as far as containers go (both dict key lookup and list membership are affected) but it is not a bug -- however it is a bug to put these in containers and then use container comparisons on the container. That's a point in favour of the status quo, though - with the burden of enforcing reflexivity placed on the containers, types are free to make use of rich comparisons to return more than just simple True/False results. Possibly. Though for types that *do* return True/False, NaN's behavior can still be infuriating. I hadn't really thought about it that way before this discussion - it is the identity checking behaviour of the builtin containers that lets us sensibly handle cases like sets of NumPy arrays. But do they? For non-empty arrays, __eq__ will always return something that is considered true, Actually, numpy.ndarray.__nonzero__() raises an exception. We've decided that there are no good conventions for deciding whether an array should be considered True or False that won't mislead people. It's quite astonishing how many people will just test if x == y: or if x != y: for a single set of inputs and confirm their guess as to the general rule from that. But your point stands, numpy arrays cannot be members of sets or keys of dicts or orderable/in-able elements of lists. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Nick Coghlan wrote: I hadn't really thought about it that way before this discussion - it is the identity checking behaviour of the builtin containers that lets us sensibly handle cases like sets of NumPy arrays. Except that it doesn't: from numpy import array a1 = array([1,2]) a2 = array([3,4]) s = set([a1, a2]) Traceback (most recent call last): File stdin, line 1, in module TypeError: unhashable type: 'numpy.ndarray' Lists aren't trouble-free either: lst = [a1, a2] a2 in lst Traceback (most recent call last): File stdin, line 1, in module ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
Terry Reedy tjre...@udel.edu writes: On 4/28/2011 4:40 AM, Mark Shannon wrote: NaN does not have to be a float or a Decimal. Perhaps it should have its own class. Like None Would it make sense for ‘NaN’ to be another instance of ‘NoneType’? -- \ “I am too firm in my consciousness of the marvelous to be ever | `\ fascinated by the mere supernatural …” —Joseph Conrad, _The | _o__) Shadow-Line_ | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
Taking a step back from all this, why does Python allow NaNs to arise from computations *at all*? +Inf and -Inf are arguably useful elements of the algebra, yet Python insists on raising an exception for 1.0./0.0 instead of returning an infinity. Why do this but not raise an exception for any operation that produces a NaN? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
On Fri, Apr 29, 2011 at 11:11 AM, Ben Finney ben+pyt...@benfinney.id.au wrote: Would it make sense for ‘NaN’ to be another instance of ‘NoneType’? This is fine IHMO as I (personally) find myself doing things like: if x is None: ... cheers James -- -- James Mills -- -- Problems are solved by method ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Nick Coghlan wrote: 1. IEEE754 is a value-based system, with a finite number of distinct NaN payloads 2. Python is an object-based system. In addition to their payload, NaN objects are further distinguished by their identity (infinite in theory, in practice limited by available memory). I argue that's an implementation detail that makes no difference. NANs should compare unequal, including to itself. That's the clear intention of IEEE-754. There's no exception made for unless y is another name for x. If there was, NANs would be reflexive, and we wouldn't be having this discussion, but the non-reflexivity of NANs is intended behaviour. The clear equivalent to object identity in value-languages is memory location. If you compare variable x to the same x, IEEE754 says you should get False. Consider: # Two distinct NANs are calculated somewhere... x = float('nan') y = float('nan') # They find themselves in some data in some arbitrary place seq = [1, 2, x, y] random.shuffle(seq) # And later x is compared to some arbitrary element in the data if math.isnan(x): if x == seq[0]: print(Definitely not a NAN) nan != x is an important invariant, breaking it just makes NANs more complicated and less useful. Tests will need to be written if x == y and not math.isnan(x) to avoid getting the wrong result for NANs. I don't see what the problem is that we're trying to fix. If containers wish to define container equality as taking identity into account, good for the container. Document it and move on, but please don't touch floats. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
Greg Ewing wrote: Taking a step back from all this, why does Python allow NaNs to arise from computations *at all*? The real question should be, why does Python treat all NANs as signalling NANs instead of quiet NANs? I don't believe this helps anyone. +Inf and -Inf are arguably useful elements of the algebra, yet Python insists on raising an exception for 1.0./0.0 instead of returning an infinity. I would argue that Python is wrong to do so. As I've mentioned a couple of times now, 20 years ago Apple felt that NANs and INFs weren't too complicated for non-programmers using Hypercard. There's no sign that Apple were wrong to expose NANs and INFs to users, no flood of Hypercard users confused by NAN inequality. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
On 4/28/11 8:44 PM, Steven D'Aprano wrote: Greg Ewing wrote: Taking a step back from all this, why does Python allow NaNs to arise from computations *at all*? The real question should be, why does Python treat all NANs as signalling NANs instead of quiet NANs? I don't believe this helps anyone. Actually, Python treats all NaNs as quiet NaNs and never signalling NaNs. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
Tarek Ziadé writes: On Thu, Apr 28, 2011 at 5:26 PM, Barry Warsaw ba...@python.org wrote: BTW, I think it always helps to have a really good assert message, and/or a leading comment to explain *why* that condition can't possibly happen. why bother, it can't happen ;) Except before breakfast, says the Red Queen. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyObject_RichCompareBool identity shortcut
Steven D'Aprano writes: (I grant that Alexander is an exception -- I understand that he does do a lot of numeric work, and does come across NANs, and still doesn't like them one bit.) I don't think I'd want anybody who *likes* NaNs to have a commit bit at python.org.shiver/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Not-a-Number
Robert Kern wrote: On 4/28/11 8:44 PM, Steven D'Aprano wrote: Greg Ewing wrote: Taking a step back from all this, why does Python allow NaNs to arise from computations *at all*? The real question should be, why does Python treat all NANs as signalling NANs instead of quiet NANs? I don't believe this helps anyone. Actually, Python treats all NaNs as quiet NaNs and never signalling NaNs. Sorry, did I get that backwards? I thought it was signalling NANs that cause a signal (in Python terms, an exception)? If I do x = 0.0/0 I get an exception instead of a NAN. Hence a signalling NAN. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com