[Python-Dev] Re: [Python-checkins] python/dist/src/Lib whrandom.py, 1.21, NONE
[EMAIL PROTECTED] Removed Files: whrandom.py Log Message: Remove the deprecated whrandom module. Woo hoo! It's about friggin' time wink. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Python-checkins] python/dist/src/Pythonmarshal.c, 1.79, 1.80
[Jeremy Hylton on a quick 2.4.1] Nothing wrong with an incremental release, but none of these sound like critical bugs to me. [Aahz] You don't think a blowup in marshal is critical? Mind expanding on that? [Jeremy] An undocumented extension to marshal causes a segfault. It's certainly a bug worth fixing. It doesn't sound like a critical bug to me. The new optional ``version`` argument to marshal.dumps() is documented. The easiest way to see that is to look at 2.4's marshal.dumps() docs wink. Unfortunately, it was wholly untested. Still, it's a new-in-2.4 gimmick, and no pre-2.4 code could be using it. I suppose Armin found a use for it in 2.4, but I'm still scratching my head. If ZODB doesn't already depend on it, how useful can it be? QED WRT my critical thread bug, I asked that everyone pretend I hadn't submitted it until a month after 2.4 was released. That hasn't happened yet, so I refuse to admit it exists. FWIW, I'd press on with 2.3.5 first, while it can still attract some volunteer effort. ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [Python-checkins] python/dist/src/Pythonmarshal.c, 1.79, 1.80
[Armin Rigo] Some code in the 'py' lib used to use marshal to send simple objects between the main process and a subprocess. We ran into trouble when we extended the idea to a subprocess that would actually run via ssh on a remote machine, and the remote machine's Python version didn't match the local one. The obvious quick fix was to set the 'version' argument to 0 and pretend that it would be fine forever. Yes, I believed you had *some* use for it wink. Now that we essentially can't use this trick any more because of the 2.4.0 bug, Well, you can still use 2.3.4 -- or wait for 2.4.1 -- or use your patched 2.4. Or use the stock 2.4, but set up a marshal server running 2.3.4 heh. we reverted to repr/eval, which is quite slower (and actually not guaranteed to work across Python versions either: string escapes sometimes change). Really? The precise rules str's __repr__ uses for which escapes to produce certainly change, but I don't recall any case outside Unicodeland where a new string escape was ever introduced. So, e.g., current Python str.__repr__() produces '\n' for a newline, while long-ago Pythons produced '\012', but all versions of Python *accept* either form of escape. The biggest change of this kind was moving from octal escapes to hex escapes in Python 2.1, but hex escapes have always been accepted -- repr() just didn't produce them before 2.1. We avoid to use cPickle because we want to be sure that only simple enough objects are sent over this way -- essentially nothing that depends on a precise Python module to be installed and identical on both machines. It's possible (but irksome) to subclass pickle.py's Pickler class, and override its save_global() and save_inst() methods. For example, replace them with one-liners that just raise an exception. Then any attempt to pickle an object requiring a specific module will raise that exception. But if you're worried about speed, going thru pickle.py is significantly slower than going thru repr(). ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
[Bob Ippolito] ... Your expectation is not correct for Darwin's memory allocation scheme. It seems that Darwin creates allocations of immutable size. The only way ANY part of an allocation will ever be used by ANYTHING else is if free() is called with that allocation. Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. In that case, it will create a new allocation of at least the requested size, copy the contents of the original allocation into the new allocation (probably with copy-on-write pages if it's large enough, so it might be cheap), and free() the allocation. Really? Another near-universal quality of implementation expectation is that a growing realloc() will strive to extend in-place. Like realloc(malloc(100), 101). For example, the theoretical guarantee that one-at-a-time list.append() has amortized linear time doesn't depend on that, but pragmatically it's greatly helped by a reasonable growing realloc() implementation. In the case where realloc() specifies a size that is not greater than the allocation's size, it will simply return the given allocation and cause no side- effects whatsoever. Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. However, it is our (in the I know you use Windows but I am not the only one that uses Mac OS X sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any fix to the allocator unless we can prove it has some security implications in software shipped with their OS. ... Is there any known case where Python performs poorly on this OS, for this reason, other than the pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes case? Claiming rampant performance problems should require evidence too wink. ... Presumably this can happen at other places (including third party extensions), so a better place to do this might be _PyString_Resize(). list_resize() is another reasonable place to put this. I'm sure there are other places that use realloc() too, and the majority of them do this through obmalloc. So maybe instead of trying to track down all the places where this can manifest, we should just gunk up Python and patch PyObject_Realloc()? There is no choke point for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. I have no idea how often realloc() is called directly by 3rd-party extension modules. It's called directly a lot in Zope's C code, but AFAICT only to grow vectors, never to shrink them. ' Since we are both pretty confident that other allocators aren't like Darwin, this gunk can be #ifdef'ed to the __APPLE__ case. #ifdef's are a last resort: they almost never go away, so they complicate the code forever after, and typically stick around for years even after the platform problems they intended to address have been fixed. For obvious reasons, they're also an endless source of platform-specific bugs. Note that pymalloc already does a memcpy+free when in PyObject_Realloc(p, n) p was obtained from the system malloc or realloc but n is small enough to meet the small object threshold (pymalloc takes over small blocks that result from a PyObject_Realloc()). That's a reasonable strategy *because* n is always small in such cases. If you're going to extend this strategy to n of arbitrary size, then you may also create new performance problems for some apps on Darwin (copying n bytes can get arbitrarily expensive). ... I'm sure I'll find something, but what's important to me is that Python works well on Mac OS X, so something should happen. I agree the socket-abuse case should be fiddled, and for more reasons than just Darwin's realloc() quirks. I don't know that there are actual problems on Darwin broader than that case (and I'm not challenging you to contrive one, I'm asking whether realloc() quirks are suspected in any other case that's known). Part of what you demonstrated when you said that pystone didn't slow down when you fiddled stuff is that pystone also didn't speed up. I also don't know that the memcpy+free wormaround is actually going to help more than it hurts overall. Yes, in the socket-abuse case, where the program routinely malloc()s strings millions of bytes larger than the socket can deliver, it would obviously help. That's not typically program behavior (however typical it may be of that specific app). More typical is shrinking a long list one element at a time, in which case about half the list remaining would get memcpy'd from time to time where such copies never get made
Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations
[Tim Peters] Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes. [Bob Ippolito] Whatever that means. Well, it means what it said. The C standard says nothing about performance metrics of any kind, and a production-quality implementation of C requires very much more than just meeting what the standard requires. The phrase quality of implementation is used in the C Rationale (but not in the standard proper) to cover all such issues. realloc() pragmatics are quality-of-implementation issues; the accuracy of fp arithmetic is another (e.g., if you get back -666.0 from the C 1.0 + 2.0, there's nothing in the standard to justify a complaint). free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. From later comments feigning outrage wink, I take it that the size of the allocation here does not mean the specific number the user passed to the previous malloc/realloc call, but means whatever amount of address space the implementation decided to use internally. Sorry, but I assumed it meant the former at first. ... Was this a good decision? Probably not! Sounds more like a bug (or two) to me than a decision, but I don't know. You said yourself that it is standards compliant ;) I have filed it as a bug, but it is probably unlikely to be backported to current versions of Mac OS X unless a case can be made that it is indeed a security flaw. That's plausible. If you showed me a case where Python's list.sort() took cubic time, I'd certainly consider that to be a bug, despite that nothing promises better behavior. If I wrote a malloc subsystem and somebody pointed out did you know that when I malloc 1024**2+1 bytes, and then realloc(1), I lose the other megabyte forever?, I'd consider that to be a bug too (because, docs be damned, I wouldn't intentionally design a malloc subsystem with such behavior; and pymalloc does in fact copy bytes on a shrinking realloc in blocks it controls, whenever at least a quarter of the space is given back -- and it didn't at the start, and I considered that to be a bug when it was pointed out). ... Known case? No. Do I want to search Python application-space to find one? No. Serious problems on a platform are usually well-known to users on that platform. For example, it was well-known that Python's list-growing strategy as of a few years ago fragmented address space horribly on Win9X. This was a C quality-of-implementation issue specific to that platform. It was eventually resolved by improving the list-growing strategy on all platforms -- although it's still the case that Win9X does worse on list-growing than other platforms, it's no longer a disaster for most list-growing apps on Win9X. If there's a problem with overallocate then realloc() to cut back on Darwin that affects many apps, then I'd expect Darwin users to know about that already -- lots of people have used Python on Macs since Python's beginning, mysterious slowdowns and mysterious bloat get noticed, and Darwin has been around for a while. .. There is no choke point for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. ... In the case of Python, some places means nowhere relevant. Four standard library extension modules relevant to the platform use realloc directly: _sre Uses realloc only to grow buffers. cPickle Uses realloc only to grow buffers. cStringIO Uses realloc only to grow buffers. regexpr: Uses realloc only to grow buffers. Good! If Zope doesn't use the allocator that Python gives it, then it can deal with its own problems. I would expect most extensions to use Python's allocator. I don't know. ... They're [#ifdef's] also the only good way to deal with platform-specific inconsistencies. In this specific case, it's not even possible to determine if a particular allocator implementation is stupid or not without at least using a platform-allocator-specific function to query the size reserved by a given allocation. We've had bad experience on several platforms when passing large numbers to recv(). If that were addressed, it's unclear that Darwin realloc() behavior would remain a real issue. OTOH, it is clear that *just* worming around Darwin realloc() behavior won't help other platforms with problems in the same *immediate* area of bug 1092502. Gross over-allocation followed by a shrinking realloc() just isn't common in Python. sock_recv() is an exceptionally bad case. More typical is, e.g., fileobject.c's get_line(), where if a line exceed 100 characters the buffer keeps growing by 25% until there's enough room, then it's cut back once at the end. That typical use for shrinking realloc() just isn't going to be implicated in a real problem -- the over
Re: [Python-Dev] Re: super() harmful?
[Guido] Then why is the title Python's Super Considered Harmful ??? Here's my final offer. Change the title to something like Multiple Inheritance Pitfalls in Python and nobody will get hurt. [Bill Janssen] Or better yet, considering the recent thread on Python marketing, Multiple Inheritance Mastery in Python :-). I'm sorry, but that's not good marketing -- it contains big words, and putting the brand name last is ineffective. How about Python's Super() is Super -- Over 1528.7% Faster than C! BTW, it's important that fractional percentages end with an odd digit. Research shows that if the last digit is even, 34.1% of consumers tend to suspect the number was made up. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] a bunch of Patch reviews
[Martin asks whether Irmen wants to be a tracker admin on SF] [Irmen de Jong] That sounds very convenient, thanks. Does the status of 'python project member' come with certain expectations that must be complied with ? ;-) If you're using Python, you're already required to comply with all of Guido's demands, this would just make it more official. Kinda like the difference in sanctifying cohabitation with a marriage ceremony wink. OK, really, the minimum required of Python project members is that they pay some attention to Python-Dev. 2- As shadow passwords can only be retrieved when you are root, is a unit test module even useful? Probably not. Alternatively, introduce a root resource, and make that test depend on the presence of the root resource. I'm not sure what this resource is actually. I have seen them pass on my screen when executing the regression tests (resource network is not enabled, etc) but never paid much attention to them. Are they used to select optional parts of the test suite that can only be run in certain conditions? That's right, where the condition is precisely that you tell regrtest.py to enable a (one or more) named resource. There's no intelligence involved. Resource names are arbitrary, and can be passed to regrtest.py's -u argument. See regrtest's docstring for details. For example, to run the tests that require the network resource, pass -u network. Then it will run network tests, and regardless of whether a network is actually available. Passing -u all makes it try to run all tests. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] a bunch of Patch reviews
[Martin v. Löwis] ... - Add an entry to Misc/NEWS, if there is a new feature, or if it is a bug fix for a maintenance branch (I personally don't list bug fixed in the HEAD revision, but others apparently do) You should. In part this is to comply with license requirements: we're a derivative work from CNRI and BeOpen's Python releases, and their licenses require that we include a brief summary of the changes made to Python. That certainly includes changes made to repair bugs. It's also extremely useful in practice to have a list of repaired bugs in NEWS! That saved me hours just yesterday, when trying to account for a Zope3 test that fails under Python 2.4 but works under 2.3.4. 2.4 NEWS pointed out that tuple hashing changed to close bug 942952, which I can't imagine how I would have remembered otherwise. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] state of 2.4 final release
[Anthony Baxter] I didn't see any replies to the last post, so I'll ask again with a better subject line - as I said last time, as far as I'm aware, I'm not aware of anyone having done a fix for the issue Tim identified ( http://www.python.org/sf/1069160 ) So, my question is: Is this important enough to delay a 2.4 final for? [Tim] Not according to me; said before I'd be happy if everyone pretended I hadn't filed that report until a month after 2.4 final was released. [Raymond Hettinger] Any chance of this getting fixed before 2.4.1 goes out in February? It probably won't be fixed by me. It would be better if a Unix-head volunteered to repair it, because the most likely kind of thread race (explained in the bug report) has proven impossible to provoke on Windows (short of carefully inserting sleeps into Python's C code) any of the times this bug has been reported in the past (the same kind of bug has appeared several times in different parts of Python's threading code -- holding the GIL is not sufficient protection against concurrent mutation of the tstate chain, for reasons explained in the bug report). A fix is very simple (also explained in the bug report) -- acquire the damn mutex, don't trust to luck. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Interpreter Thread Safety?
... [Evan Jones] What I was trying to ask with my last email was what are the trouble areas? There are probably many that I am unaware of, due to my unfamiliarity the Python internals. Google on Python free threading. That's not meant to be curt, it's just meant to recognize that the task is daunting and has been discussed often before. [Martin v. Löwis] Due to some unfortunate historical reasons, there is code which enters free() without holding the GIL - and that is what the allocator specifically deals with. Right, but as said in a previous post, I'm not convinced that the current implementation is completely correct anyway. Sorry, I haven't had time for this. From your earlier post: For example, is it possible to call PyMem_Free from two threads simultaneously? Possible but not legal; undefined behavior if you try. See the Thread State and the Global Interpreter Lock section of the Python C API manual. ... only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions There are only a handful of exceptions to the last part of that rule, concerned with interpreter and thread startup and shutdown, and they're explicitly listed in that section. The memory-management functions aren't among them. In addition, it's not legal to call PyMem_Free regardless unless the pointer passed to it was originally obtained from another function in the PyMem_* family (that specifically excludes memory obtained from a PyObject_* function). In a release build, all of the PyMem_* allocators resolve directly to the platform malloc or realloc, and all PyMem_Free has to determine is that they *were* so allocated and thus call the platform free() directly (which is presumably safe to call without holding the GIL). The hacks in PyObject_Free (== PyMem_Free) are there solely so that question can be answered correctly in the absence of holding the GIL. That question == does pymalloc control the pointer passed to me, or does the system malloc?. In return, that hack is there solely because in much earlier versions of Python extension writers got into the horrible habit of allocating object memory with PyObject_New but releasing it with PyMem_Free, and because indeed Python didn't *have* a PyObject_Free function then. Other extension writers were just nuts, mixing PyMem_* calls with direct calls to system free/malloc/realloc, and ignoring GIL issues for all of those. When pymalloc was new, we went to insane lengths to avoid breaking that stuff, but enough is enough. Since the problem is that threads could call PyMem_Free without holding the GIL, it seems to be that it is possible. Yes, but not specific to PyMem_Free. It's clearly _possible_ to call _any_ function from multiple threads without holding the GIL. Shouldn't it also be supported? No. If what they want is the system malloc/realloc/free, that's what they should call. In the current memory allocator, I believe that situation can lead to inconsistent state. Certainly, but only if pymalloc controls the memory blocks. If they were actually obtained from the system malloc, the only part of pymalloc that has to work correctly is the Py_ADDRESS_IN_RANGE() macro. When that returns false, the only other thing PyObject_Free() does is call the system free() immediately, then return. None of pymalloc's data structures are involved, apart from the hacks ensuring that the arena of base addresses is safe to access despite potentlly current mutation-by-appending. ... Basically, if a concurrent memory allocator is the requirement, It isn't. The attempt to _exploit_ the GIL by doing no internal locking of its own is 100% deliberate in pymalloc -- it's a significant speed win (albeit on some platforms more than others). then I think some other approach is necessary. If it became necessary, that's what this section of obmalloc is for: SIMPLELOCK_DECL(_malloc_lock) #define LOCK() SIMPLELOCK_LOCK(_malloc_lock) #define UNLOCK()SIMPLELOCK_UNLOCK(_malloc_lock) #define LOCK_INIT() SIMPLELOCK_INIT(_malloc_lock) #define LOCK_FINI() SIMPLELOCK_FINI(_malloc_lock) You'll see that PyObject_Free() calls LOCK() and UNLOCK() at appropriate places already, but they have empty expansions now. Back to the present: [Martin] Again, the interpreter supports multi-threading today. Removing the GIL is more difficult, though - nearly any container object (list, dictionary, etc) would have to change, plus the reference counting (which would have to grow atomic increment/decrement). [Evan] Wouldn't it be up to the programmer to ensure that accesses to shared objects, like containers, are serialized? For example, with Java's collections, there are both synchronized and unsynchronized versions. Enormous mounds of existing threaded Python code freely manipulates lists and dicts without explicit locking now. We can't break that -- and wouldn't want to. Writing
Re: [Python-Dev] 2.3.5 and 2.4.1 release plans
[Anthony Baxter] Ok, so here's the state of play: 2.3.5 is currently aimed for next Tuesday, but there's an outstanding issue - the new copy code appears to have broken something, see www.python.org/sf/1114776 for the gory details. ... [Alex Martelli] The problem boils down to: deepcopying an instance of a type that doesn't have an __mro__ (and is not one of the many types explicitly recorded in the _deepcopy_dispatch dictionary, such as types.ClassType, types.InstanceType, etc, etc). The easy fix: instead of cls.__mro__ use inspect.getmro which deals with that specifically. Before I commit the fix: can anybody help out with an example of a type anywhere in the standard library that should be deepcopyable, used to be deepcopyable in 2.3.4, isn't one of those which get explicitly recorded in copy._deepcopy_dispatch, AND doesn't have an __mro__? Even the _testcapi.Copyable type magically grows an __mro__; I'm not sure how to MAKE a type w/o one... Since the original bug report came from Zopeland, chances are good (although the report is too vague to be sure) that the problem involves ExtensionClass. That's complicated C code in Zope predating new-style classes, making it possible to build Python-class-like objects in C code under old Pythons. In general, EC-derived classes don't play well with newer Python features (well, at least not until Zope 2.8, where ExtensionClass is recoded as a new-style Python class -- but still keeping some semantics from old-style classes ... ). Anyway, I expect that instances of any EC-derived class would have the problem in the bug report. For example, the base Persistent class in ZODB 3.2.5 is an ExtensionClass: $ \python23\python.exe Python 2.3.5c1 (#61, Jan 25 2005, 19:52:06) [MSC v.1200 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import ZODB # don't ask -- it's necessary to import this first from Persistence import Persistent p = Persistent() import copy copy.deepcopy(p) # deepcopy() barfs on __mro__ Traceback (most recent call last): File stdin, line 1, in ? File C:\Python23\lib\copy.py, line 200, in deepcopy copier = _getspecial(cls, __deepcopy__) File C:\Python23\lib\copy.py, line 66, in _getspecial for basecls in cls.__mro__: AttributeError: __mro__ copy.copy(p) # copy() does too Traceback (most recent call last): File stdin, line 1, in ? File C:\Python23\lib\copy.py, line 86, in copy copier = _getspecial(cls, __copy__) File C:\Python23\lib\copy.py, line 66, in _getspecial for basecls in cls.__mro__: AttributeError: __mro__ Unsure whether this is enough, but at least inspect.getmro() isn't phased by an EC-derived class: inspect.getmro(Persistent) (extension class Persistence.Persistent at 100040D8,) More info from the bug report filer is really needed. A problem is that this stuff doesn't appear to work under Python 2.3.4 either: $ ../Python-2.3.4/python Python 2.3.4 (#1, Aug 9 2004, 17:15:36) [GCC 3.2.2 20030222 (Red Hat Linux 3.2.2-5)] on linux2 Type help, copyright, credits or license for more information. import ZODB from Persistence import Persistent p = Persistent() import copy copy.deepcopy(p) Traceback (most recent call last): File stdin, line 1, in ? File /home/tim/Python-2.3.4/Lib/copy.py, line 206, in deepcopy y = _reconstruct(x, rv, 1, memo) File /home/tim/Python-2.3.4/Lib/copy.py, line 338, in _reconstruct y = callable(*args) TypeError: ExtensionClass object argument after * must be a sequence copy.copy(p) Traceback (most recent call last): File stdin, line 1, in ? File /home/tim/Python-2.3.4/Lib/copy.py, line 95, in copy return _reconstruct(x, rv, 0) File /home/tim/Python-2.3.4/Lib/copy.py, line 338, in _reconstruct y = callable(*args) TypeError: ExtensionClass object argument after * must be a sequence ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39
[EMAIL PROTECTED] Modified Files: xmlrpclib.py Log Message: accept datetime.datetime instances when marshalling; dateTime.iso8601 elements still unmarshal into xmlrpclib.DateTime objects Index: xmlrpclib.py ... +if datetime and isinstance(value, datetime.datetime): +self.value = value.strftime(%Y%m%dT%H:%M:%S) +return ... [and similarly later] ... Fred, is there a reason to avoid datetime.datetime's .isoformat() method here? Like so: import datetime print datetime.datetime(2005, 2, 10, 14, 0, 8).isoformat() 2005-02-10T14:00:08 A possible downside is that you'll also get fractional seconds if the instance records a non-zero .microseconds value. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39
[Tim] Fred, is there a reason to avoid datetime.datetime's .isoformat() method here? Like so: Yes. The XML-RPC spec is quite vague. It claims that the dates are in ISO 8601 format, but doesn't say anything more about it. The example shows a string without hyphens (but with colons), so I stuck with eactly that. Well, then since that isn't ISO 8601 format, it would be nice to have a comment explaining why it's claiming to be anyway 0.5 wink. A possible downside is that you'll also get fractional seconds if the instance records a non-zero .microseconds value. There's nothing in the XML-RPC spec about the resolution of time, so, again, I'd rather be conservative in what we generate. dt.replace(microsecond=0).isoformat() suffices for that much. Tack on .replace('-', '') to do the whole job. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Python-checkins] python/dist/src/Lib xmlrpclib.py, 1.38, 1.39
[Tim] Well, then since that isn't ISO 8601 format, it would be nice to have a comment explaining why it's claiming to be anyway 0.5 wink. [Fred] Hmm, that's right (ISO 8601:2000, section 5.4.2). Sigh. Ain't your fault. I didn't remember that I had seen the XML-RPC spec before, in conjunction with its crazy rules for representing floats. It's a very vague doc indeed. Anyway, some quick googling strongly suggests that many XML-RPC implementors don't know anything about 8601 either, and accept/produce only the non-8601 format inferred from the single example in the spec. Heh -- kids wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ViewCVS on SourceForge is broken
[Trent Mick] Has anyone else noticed that viewcvs is broken on SF? It failed the same way from Virginia just now. I suppose that's your reward for kindly updating the Python copyright wink. The good news is that you can use this lull in your Python work to contribute to ZODB development! ViewCVS at zope.org is always happy to see you: http://svn.zope.org/ZODB/trunk/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ViewCVS on SourceForge is broken
[Thomas Heller] http://sourceforge.net/docman/display_doc.php?docid=2352group_id=1#1107968334 Jeez Louise! As of 2005-02-09 there is an outage of anonymous CVS (tarballs, pserver-based CVS and ViewCVS) for projects whose UNIX names start with the letters m, n, p, q, t, y and z. We are currently working on resolving this issue. So that means it wouldn't even do us any good to rename the project to Thomas, Trent, Mick, Tim, Peters, or ZPython either! All right. Heller 2.5, here we come. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] builtin_id() returns negative numbers
[Troels Walsted Hansen] The Python binding in libxml2 uses the following code for __repr__(): class xmlNode(xmlCore): def __init__(self, _obj=None): self._o = None xmlCore.__init__(self, _obj=_obj) def __repr__(self): return xmlNode (%s) object at 0x%x % (self.name, id (self)) With Python 2.3.4 I'm seeing warnings like the one below: frozen module libxml2:2357: FutureWarning: %u/%o/%x/%X of negative int will return a signed string in Python 2.4 and up I believe this is caused by the memory address having the sign bit set, causing builtin_id() to return a negative integer. Yes, that's right. I grepped around in the Python standard library and found a rather awkward work-around that seems to be slowly propagating to various module using the '%x' % id(self) idiom: No, it's not propagating any more: I see that none of these exist in 2.4: Lib/asyncore.py: # On some systems (RH10) id() can be a negative number. # work around this. MAX = 2L*sys.maxint+1 return '%s at %#x' % (' '.join(status), id(self)MAX) $ grep -r 'can be a negative number' * Lib/asyncore.py:# On some systems (RH10) id() can be a negative number. Lib/repr.py:# On some systems (RH10) id() can be a negative number. Lib/tarfile.py:# On some systems (RH10) id() can be a negative number. Lib/test/test_repr.py:# On some systems (RH10) id() can be a negative number. Lib/xml/dom/minidom.py:# On some systems (RH10) id() can be a negative number. There are many modules that do not have this work-around in Python 2.3.4. Not sure, but it looks like this stuff was ripped out in 2.4 simply because 2.4 no longer produces a FutureWarning in these cases. That doesn't address that the output changed, or that the output for a negative id() produced by %x under 2.4 is probably surprising to most. Wouldn't it be more elegant to make builtin_id() return an unsigned long integer? I think so. This is the function ZODB 3.3 uses, BTW: # Addresses can look negative on some boxes, some of the time. If you # feed a negative address to an %x format, Python 2.3 displays it as # unsigned, but produces a FutureWarning, because Python 2.4 will display # it as signed. So when you want to prodce an address, use positive_id() to # obtain it. def positive_id(obj): Return id(obj) as a non-negative integer. result = id(obj) if result 0: # This is a puzzle: there's no way to know the natural width of # addresses on this box (in particular, there's no necessary # relation to sys.maxint). Try 32 bits first (and on a 32-bit # box, adding 2**32 gives a positive number with the same hex # representation as the original result). result += 1L 32 if result 0: # Undo that, and try 64 bits. result -= 1L 32 result += 1L 64 assert result = 0 # else addresses are fatter than 64 bits return result The gives a non-negative result regardless of Python version and (almost) regardless of platform (the `assert` hasn't triggered on any ZODB 3.3 platform yet). Is the performance impact too great? For some app, somewhere, maybe. It's a tradeoff. The very widespread practice of embedding %x output from id() favors getting rid of the sign issue, IMO. A long integer is used on platforms where SIZEOF_VOID_P SIZEOF_LONG (most 64 bit platforms?), Win64 is probably the only major (meaning likely to be popular among Python users) platform where sizeof(void*) sizeof(long). so all Python code must be prepared to handle it already... In theory wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: builtin_id() returns negative numbers
[Fredrik Lundh] can anyone explain the struct.pack and ZODB use cases? the first one doesn't make sense to me, Not deep and surely not common, just possible. If you're on a 32-bit box and doing struct.pack(...i..., ... id(obj) ...), it in fact cannot fail now (no, that isn't guaranteed by the docs, it's just an implementation reality), but would fail if id() ever returned a positive long with the same bit pattern as a negative 32-bit int (OverflowError: long int too large to convert to int).. and the other relies on Python *not* behaving as documented (which is worse than relying on undocumented behaviour, imo). I don't know what you think the problem with ZODB's integer-flavored keys might be, then. The problem I'm thinking of is that by integer-flavored they really mean *C* int, not Python integer (which is C long). They're delicate enough that way that they already don't work right on most current 64-bit boxes whenever the value of a Python int doesn't in fact fit in the platform's C int: http://collector.zope.org/Zope/1592 If id() returned a long in some cases on 32-bit boxes, then code using id() as key (in an II or IO tree) or value (in an II or OI) tree would stop working. Again, the Python docs didn't guarantee this would work, and the int-flavored BTrees have 64-bit box bugs in their handling of integers, but the id()-as-key-or-value case has nevertheless worked wholly reliably until now on 32-bit boxes. Any change in visible behavior has the potential to break code -- that shouldn't be controversial, because it's so obvious, and so relentlessly proved in real life. It's a tradeoff. I've said I'm in favor of taking away the sign issue for id() in this case, although I'm not going to claim that no code will break as a result, and I'd be a lot more positive about it if we could use the time machine to change this behavior for 2.4. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pymalloc on 2.1.3
[Fredrik Lundh] does anyone remember if there were any big changes in pymalloc between the 2.1 series (where it was introduced) and 2.3 (where it was enabled by default). Yes, huge -- few original lines survived exactly, although many survived in intent. or in other words, is the 2.1.3 pymalloc stable enough for production use? Different question entirely wink. It _was_ used in production by some people, and happily so. Major differences: + 2.1 used a probabilistic scheme for guessing whether addresses passed to it were obtained from pymalloc or from the system malloc. It was easy for a malicous pure-Python program to corrupt pymalloc and/or malloc internals as a result, leading to things like segfaults, and even sneaky ways to mutate the Python bytecode stream. It's extremely unlikely that a non- malicious program could bump into these. + Horrid hackery went into 2.3's version to cater to broken extension modules that called PyMem functions without holding the GIL. 2.1's may not be as thread-safe in these cases. + 2.1's only fields requests up to 64 bytes, 2.3's up to 256 bytes. Changes in the dict implementation, and new-style classes, for 2.3 made it a pragmatic necessity to boost the limit for 2.3. (we're having serious memory fragmentation problems on a 2.1.3 system, and while I can patch/rebuild the interpreter if necessary, we cannot update the system right now...) I'd give it a shot -- pymalloc has always been very effective at handling large numbers of small objects gracefully. The meaning of small got 4x bigger since 2.1, which appeared to be a pure win, but 64 bytes was enough under 2.1 that most small instance dicts fit. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory Allocator Part 2: Did I get it right?
[Evan Jones] After I finally understood what thread-safety guarantees the Python memory allocator needs to provide, I went and did some hard thinking about the code this afternoon. I believe that my modifications provide the same guarantees that the original version did. I do need to declare the arenas array to be volatile, and leak the array when resizing it. Please correct me if I am wrong, but the situation that needs to be supported is this: As I said before, I don't think we need to support this any more. More, I think we should not -- the support code is excruciatingly subtle, it wasted plenty of your time trying to keep it working, and if we keep it in it's going to continue to waste time over the coming years (for example, in the short term, it will waste my time reviewing it). While one thread holds the GIL, any other thread can call PyObject_Free with a pointer that was returned by the system malloc. What _was_ supported was more generally that any number of threads could call PyObject_Free with pointers that were returned by the system malloc/realloc at the same time as a single thread, holding the GIL, was doing anything whatsoever (including executing any code inside obmalloc.c) Although that's a misleading way of expressing the actual intent; more on that below. The following situation is *not* supported: While one thread holds the GIL, another thread calls PyObject_Free with a pointer that was returned by PyObject_Malloc. Right, that was never supported (and I doubt it could be without introducing a new mutex in obmalloc.c). I'm hoping that I got things a little better this time around. I've submitted my updated patch to the patch tracker. For reference, I've included links to SourceForge and the previous thread. Thank you, Thank you! I probably can't make time to review anything before this weekend. I will try to then. I expect it would be easier if you ripped out the horrid support for PyObject_Free abuse; in a sane world, the release-build PyMem_FREE, PyMem_Del, and PyMem_DEL would expand to free instead of to PyObject_FREE (via changes to pymem.h). IOW, it was never the _intent_ that people be able to call PyObject_Free without holding the GIL. The need for that came from a different problem, that old code sometimes mixed calls to PyObject_New with calls to PyMem_DEL (or PyMem_FREE or PyMem_Del). It's for that latter reason that PyMem_DEL (and its synonyms) were changed to expand to PyObject_Free. This shouldn't be supported anymore. Because it _was_ supported, there was no way to tell whether PyObject_Free was being called because (a) we were catering to long-obsolete but once-loved code that called PyMem_DEL while holding the GIL and with a pointer obtained by PyObject_New; or, (b) somebody was calling PyMem_Del (etc) with a non-object pointer they had obtained from PyMem_New, or from the system malloc directly. It was never legit to do #a without holding the GIL. It was clear as mud whether it was legit to do #b without holding the GIL. If PyMem_Del (etc) change to expand to free in a release build, then #b can remain clear as mud without harming anyone. Nobody should be doing #a anymore. If someone still is, tough luck -- fix it, you've had years of warning is easy for me to live with at this stage. I suppose the other consideration is that already-compiled extension modules on non-Windows(*) systems will, if they're not recompiled, continue to call PyObject_Free everywhere they had a PyMem_Del/DEL/FREE call. If such code is calling it without holding the GIL, and obmalloc.c stops trying to support this insanity, then they're going to grow some thread races they woudn't have if they did recompile (to get such call sites remapped to the system free). I don't really care about that either: it's a general rule that virtually all Python API functions must be called with the GIL held, and there was never an exception in the docs for the PyMem_ family. (*) Windows is immune simply because the Windows Python is set up in such a way that you always have to recompile extension modules when Python's minor version number (the j in i.j.k) gets bumped. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Memory Allocator Part 2: Did I get it right?
[Tim Peters] As I said before, I don't think we need to support this any more. More, I think we should not -- the support code is excruciatingly subtle, it wasted plenty of your time trying to keep it working, and if we keep it in it's going to continue to waste time over the coming years (for example, in the short term, it will waste my time reviewing it). [Evan Jones] I do not have nearly enough experience in the Python world to evaluate this decision. I've only been programming in Python for about two years now, and as I am sure you are aware, this is my first patch that I have submitted to Python. I don't really know my way around the Python internals, beyond writing basic extensions in C. Martin's opinion is clearly the opposite of yours. ? This is all I recall Martin saying about this: http://mail.python.org/pipermail/python-dev/2005-January/051265.html I'm not certain it is acceptable to make this assumption. Why is it not possible to use the same approach that was previously used (i.e. leak the arenas array)? Do you have something else in mind? I'll talk with Martin about it if he still wants to. Martin, this miserable code must die! Basically, the debate seems to boil down to maintaining backwards compatibility at the cost of making the code in obmalloc.c harder to understand. The let it leak to avoid thread problems cruft is arguably the single most obscure bit of coding in Python's code base. I created it, so I get to say that wink. Even 100 lines of comments aren't enough to make it clear, as you've discovered. I've lost track of how many hours of my life have been pissed away explaining it, and its consequences (like how come this or that memory-checking program complains about the memory leak it causes), and the historical madness that gave rise to it in the beginning. I've had enough of it -- the only purpose this part ever had was to protect against C code that wasn't playing by the rules anyway. BFD. There are many ways to provoke segfaults with C code that breaks the rules, and there's just not anything that special about this way _except_ that I added objectionable (even at the time) hacks to preserve this kind of broken C code until authors had time to fix it. Time's up. The particular case that is being supported could definitely be viewed as a bug in the code that using obmalloc. It also likely is quite rare. However, until now it has been supported, so it is hard to judge exactly how much code would be affected. People spent many hours searching for affected code when it first went in, and only found a few examples then, in obscure extension modules. It's unlikely usage has grown. The hack was put it in for the dubious benefit of the few examples that were found then. It would definitely be a minor barrier to moving to Python 2.5. That's in part what python-dev is for. Of course nobody here has code that will break -- but the majority of high-use extension modules are maintained by people who read this list, so that's not as empty as it sounds. It's also what alpha and beta releases are for. Fear of change isn't a good enough reason to maintain this code. Is there some sort of consensus that is possible on this issue? Absolutely, provided it matches my view 0.5 wink. Rip it out, and if alpha/beta testing suggests that's a disaster, _maybe_ put it back in. ... It turns out that basically the only thing that would change would be removing the volatile specifiers from two of the global variables, plus it would remove about 100 lines of comments. :) The work was basically just hurting my brain trying to reason about the concurrency issues, not changing code. And the brain of everyone else who ever bumps into this. There's a high probability that if this code actually doesn't work (can you produce a formal proof of correctness for it? I can't -- and I tried), nothing can be done to repair it; and code this outrageously delicate has a decent chance of being buggy no matter how many people stare at it (overlooking that you + me isn't that many). You also mentioned before that removing the volatiles may have given a speed boost, and that's believable. I mentioned above the unending costs in explanations, and nuisance gripes from memory-integrity tools about the deliberate leaks. There are many kinds of ongoing costs here, and no _intended_ benefit anymore (it certainly wasn't my intent to cater to buggy C code forever). It was never legit to do #a without holding the GIL. It was clear as mud whether it was legit to do #b without holding the GIL. If PyMem_Del (etc) change to expand to free in a release build, then #b can remain clear as mud without harming anyone. Nobody should be doing #a anymore. If someone still is, tough luck -- fix it, you've had years of warning is easy for me to live with at this stage. Hmm... The issue is that case #a may not be an easy problem to diagnose: Many errors in C
[Python-Dev] 2.4 func.__name__ breakage
Rev 2.66 of funcobject.c made func.__name__ writable for the first time. That's great, but the patch also introduced what I'm pretty sure was an unintended incompatibility: after 2.66, func.__name__ was no longer *readable* in restricted execution mode. I can't think of a good reason to restrict reading func.__name__, and it looks like this part of the change was an accident. So, unless someone objects soon, I intend to restore that func.__name__ is readable regardless of execution mode (but will continue to be unwritable in restricted execution mode). Objections? Tres Seaver filed a bug report (some Zope tests fail under 2.4 because of this): http://www.python.org/sf/1124295 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
[Michael Hudson] ... Well, I fixed it on reading the bug report and before getting to python-dev mail :) Sorry if this duplicated your work, but hey, it was only a two line change... Na, the real work was tracking it down in the bowels of Zope's C-coded security machinery -- we'll let you do that part next time wink. Did you add a test to ensure this remains fixed? A NEWS blurb (at least for 2.4.1 -- the test failures under 2.4 are very visible in the Zope world, due to auto-generated test runner failure reports)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: [ python-Bugs-1124637 ] test_subprocess is far tooslow (fwd)
[Fredrik Lundh] does anyone ever use the -u options when running tests? Yes -- I routinely do -uall, under both release and debug builds, but only on Windows. WinXP in particular seems to do a good job when hyper-threading is available -- running the tests doesn't slow down anything else I'm doing, except during the disk-intensive tests (test_largefile is a major pig on Windows). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.4 func.__name__ breakage
[sorry for the near-duplicate msgs -- looks like gmail lied when it claimed the first msg was still in draft status] Did you add a test to ensure this remains fixed? [mwh] Yup. Bless you. Did you attach a contributor agreement and mark the test as being contributed under said contributor agreement, adjacent to your valid copyright notice wink? A NEWS blurb ...? No, I'll do that now. I'm not very good at remembering NEWS blurbs... LOL -- sorry, I'm just imagining what NEWS would look like if we required a contributor-agreement notification on each blurb. I appreciate your work here, and will try to find a drug to counteract the ones I appear to have overdosed on this morning ... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%
[Gfeller Martin] what immediately comes to mind are Modules/cPickle.c and Modules/cStringIO.c, which (I believe) are heavily used by ZODB (which in turn is heavily used by the application). I probably guessed right the first time wink: LFH doesn't help with the lists directly, but helps indirectly by keeping smaller objects out of the general heap where the list guts actually live. Say we have a general heap with a memory map like this, meaning a contiguous range of available memory, where 'f' means a block is free. The units of the block don't really matter, maybe one 'f' is one byte, maybe one 'f' is 4MB -- it's all the same in the end: fff Now you allocate a relatively big object (like the guts of a large list), and it's assigned a contiguous range of blocks marked 'b': bbb Then you allocate a small object, marked 's': bbbsfff The you want to grow the big object. Oops! It can't extend the block of b's in-place, because 's' is in the way. Instead it has to copy the whole darn thing: fffsbbb But if 's' is allocated from some _other_ heap, then the big object can grow in-place, and that's much more efficient than copying the whole thing. obmalloc has two primary effects: it manages a large number of very small (= 256 bytes) memory chunks very efficiently, but it _also_ helps larger objects indirectly, by keeping the very small objects out of the platform C malloc's way. LFH appears to be an extension of the same basic idea, raising the small object limit to 16KB. Now note that pymalloc and LFH are *bad* ideas for objects that want to grow. pymalloc and LFH segregate the memory they manage into blocks of different sizes. For example, pymalloc keeps a list of free blocks each of which is exactly 64 bytes long. Taking a 64-byte block out of that list, or putting it back in, is very efficient. But if an object that uses a 64-byte block wants to grow, pymalloc can _never_ grow it in-place, it always has to copy it. That's a cost that comes with segregating memory by size, and for that reason Python deliberately doesn't use pymalloc in several cases where objects are expected to grow over time. One thing to take from that is that LFH can't be helping list-growing in a direct way either, if LFH (as seems likely) also needs to copy objects that grow in order to keep its internal memory segregated by size. The indirect benefit is still available, though: LFH may be helping simply by keeping smaller objects out of the general heap's hair. The lists also get fairly large, although not huge - up to typically 5 (complex) objects in the tests I've measured. That's much larger than LFH can handle. Its limit is 16KB. A Python list with 50K elements requires a contiguous chunk of 200KB on a 32-bit machine to hold the list guts. As I said, I don't speak C, so I can only speculate - do the lists at some point grow beyond the upper limit of obmalloc, but are handled by the LFH (which has a higher upper limit, if I understood Tim Peters correctly)? A Python list object comprises two separately allocated pieces of memory. First is a list header, a small piece of memory of fixed size, independent of len(list). The list header is always obtained from obmalloc; LFH will never be involved with that, and neither will the system malloc. The list header has a pointer to a separate piece of memory, which contains the guts of a list, a contiguous vector of len(list) pionters (to Python objects). For a list of length n, this needs 4*n bytes on a 32-bit box. obmalloc never manages that space, and for the reason given above: we expect that list guts may grow, and obmalloc is meant for fixed-size chunks of memory. So the list guts will get handled by LFH, until the list needs more than 4K entries (hitting the 16KB LFH limit). Until then, LFH probably wastes time by copying growing list guts from size class to size class. Then the list guts finally get copied to the general heap, and stay there. I'm afraid the only you can know for sure is by obtaining detailed memory maps and analyzing them. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%
[Tim Peters] ... Then you allocate a small object, marked 's': bbbsfff [Evan Jones] Isn't the whole point of obmalloc No, because it doesn't matter what follows that introduction: obmalloc has several points, including exploiting the GIL, heuristics aiming at reusing memory while it's still high in the memory heirarchy, almost never touching a piece of memory until it's actually needed, and so on. is that we don't want to allocate s on the heap, since it is small? That's one of obmalloc's goals, yes. But small is a relative adjective, not absolute. Because we're primarily talking about LFH here, the natural meaning for small in _this_ thread is 16KB, which is much larger than small means to obmalloc. The memory-map example applies just well to LFH as to obmalloc, by changing which meaning for small you have in mind. I guess s could be an object that might potentially grow. For example, list guts in Python are never handled by obmalloc, although the small fixed-size list _header_ object is always handled by obmalloc. One thing to take from that is that LFH can't be helping list-growing in a direct way either, if LFH (as seems likely) also needs to copy objects that grow in order to keep its internal memory segregated by size. The indirect benefit is still available, though: LFH may be helping simply by keeping smaller objects out of the general heap's hair. So then wouldn't this mean that there would have to be some sort of small object being allocated via the system malloc that is causing the poor behaviour? Yes. For example, a 300-character string could do it (that's not small to obmalloc, but is to LFH). Strings produced by pickling are very often that large, and especially in Zope (which uses pickles extensively under the covers -- reading and writing persistent objects in Zope all involve pickle strings). As you mention, I wouldn't think it would be list objects, since resizing lists using LFH should be *worse*. Until they get to LFH's boundary for small, and we have only the vaguest idea what Martin's app does here -- we know it grows lists containing 50K elements in the end, and ... well, that's all I really know about it wink. A well-known trick is applicable in that case, if Martin thinks it's worth the bother: grow the list to its final size once, at the start (overestimating if you don't know for sure). Then instead of appending, keep an index to the next free slot, same as you'd do in C. Then the list guts never move, so if that doesn't yield the same kind of speedup without using LFH, list copying wasn't actually the culprit to begin with. That would actually be something that is worth verifying, however. Not worth the time to me -- Windows is closed-source, and I'm too old to enjoy staring at binary disassemblies any more. Besides, list guts can't stay in LFH after the list exceeds 4K elements. If list-copying costs are significant here, they're far more likely to be due to copying lists over 4K elements than under -- copying a list takes O(len(list)) time. So the realloc() strategy used by LFH _probably_ isn't of _primary)_ interest here. It could be that the Windows LFH is extra clever? Sure -- that I doubt it moves Heaven Earth to cater to reallocs is just educated guessing. I wrote my first production heap manager at Cray Research, around 1979 wink. ... Well, it would also be useful to find out what code is calling the system malloc. This would make it easy to examine the code and see if it should be calling obmalloc or the system malloc. Any good ideas for easily obtaining this information? I imagine that some profilers must be able to produce a complete call graph? Windows supports extensive facilities for analyzing heap usage, even from an external process that attaches to the process you want to analyze. Ditto for profiling. But it's not easy, and I don't know of any free tools that are of real help. If someone were motivated enough, it would probably be easiest to run Martin's app on a Linux box, and use the free Linux tools to analyze it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: Re: Re: Prospective Peephole Transformation
[Phillip J. Eby] Still, it's rather interesting that tuple.__contains__ appears slower than a series of LOAD_CONST and == operations, considering that the tuple should be doing basically the same thing, only without bytecode fetch-and- decode overhead. Maybe it's tuple.__contains__ that needs optimizing here? [Fredrik Lundh] wouldn't be the first time... How soon we forget wink. Fredrik introduced a pile of optimizations special-casing the snot out of small integers into ceval.c a long time ago, like this in COMPARE_OP: case COMPARE_OP: w = POP(); v = TOP(); if (PyInt_CheckExact(w) PyInt_CheckExact(v)) { /* INLINE: cmp(int, int) */ register long a, b; register int res; a = PyInt_AS_LONG(v); b = PyInt_AS_LONG(w); switch (oparg) { case PyCmp_LT: res = a b; break; case PyCmp_LE: res = a = b; break; case PyCmp_EQ: res = a == b; break; case PyCmp_NE: res = a != b; break; case PyCmp_GT: res = a b; break; case PyCmp_GE: res = a = b; break; case PyCmp_IS: res = v == w; break; case PyCmp_IS_NOT: res = v != w; break; default: goto slow_compare; } x = res ? Py_True : Py_False; Py_INCREF(x); } else { slow_compare: x = cmp_outcome(oparg, v, w); } That's a hell of a lot faster than tuple comparison's deferral to PyObject_RichCompareBool can be, even if we inlined the same blob inside the latter (then we'd still have the additional overhead of calling PyObject_RichCompareBool). As-is, PyObject_RichCompareBool() has to do (relatively) significant work just to out find which concrete comparision implementation to call. As a result, i == j in Python source code, when i and j are little ints, is much faster than comparing i and j via any other route in Python. That's mostly really good, IMO -- /F's int optimizations are of major value in real life. Context-dependent optimizations make code performance less predictable too -- that's life. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Prospective Peephole Transformation
[Raymond Hettinger] ... The problem with the transformation was that it didn't handle the case where x was non-hashable and it would raise a TypeError instead of returning False as it should. I'm very glad you introduced the optimization of building small constant tuples at compile-time. IMO, that was a pure win. I don't like this one, though. The meaning of x in (c1, c2, ..., c_n) is x == c1 or x == c2 or ... or x == c_n, and a transformation that doesn't behave exactly like the latter in all cases is simply wrong. Even if x isn't hashable, it could still be of a type that implements __eq__, and where x.__eq__(c_i) returned True for some i, and then False is plainly the wrong result. It could also be that x is of a type that is hashable, but where x.__hash__() raises TypeError at this point in the code. That could be for good or bad (bug) reasons, but suppressing the TypeError and converting into False would be a bad thing regardless. That situation arose once in the email module's test suite. I don't even care if no code in the standard library triggered a problem here: the transformation isn't semantically correct on the face of it. If we knew the type of x at compile-time, then sure, in most (almost all) cases we could know it was a safe transformation (and even without the hack to turn TypeError into False). But we don't know now, so the worst case has to be assumed: can't do this one now. Maybe someday, though. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Windows Low Fragementation Heap yields speedup of ~15%
[Tim Peters] grow the list to its final size once, at the start (overestimating if you don't know for sure). Then instead of appending, keep an index to the next free slot, same as you'd do in C. Then the list guts never move, so if that doesn't yield the same kind of speedup without using LFH, list copying wasn't actually the culprit to begin with. [Evan Jones] If this *does* improve the performance of his application by 15%, that would strongly argue for an addition to the list API similar to Java's ArrayList.ensureCapacity or the STL's vectorT::reserve. Since the list implementation already maintains separate ints for the list array size and the list occupied size, this would really just expose this implementation detail to Python. I don't like revealing the implementation in this fashion, but if it does make a significant performance difference, it could be worth it. That's a happy thought! It was first suggested for Python in 1991 wink, but before Python 2.4 the list implementation didn't have separate members for current size and capacity, so can't get there from here was the only response. It still wouldn't be trivial, because nothing in listobject.c now believes the allocated size ever needs to be preserved, and all len()-changing list operations ensure that not too much overallocation remains (see list_resize() in listobject.c for details). But let's see whether it would help first. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Useful thread project for 2.5?
Florent Guillaume recently wrote a valuable addin for Zope: http://www.zope.org/Members/nuxeo/Products/DeadlockDebugger When a Zope has threads that are hung, this can give a report of Python's current state (stack trace) across all threads -- even the ones that are hung (the deadlocked threads don't have to cooperate). The same flavor of thing would (of course) be handy outside of Zope too -- debugging deadlocked Python threads is a PITA regardless of context. Florent's DeadlockDebugger in turn builds on an external C threadframe module: http://www.majid.info/mylos/stories/2004/06/10/threadframe.html Folding the functionality of that (or similar functionality) into the core would, IMO, be a valuable addition for 2.5, and would make an excellent intro project for an aspiring contributor interested in how threads work in CPython (what this module does is conceptually simple). It belongs in the core because it's not safe to chase the tstate chain without holding pystate.c's internal head_mutex lock (holding the GIL isn't enough -- it's normal practice to call PyThreadState_Delete() while not holding the GIL). I'd do it myself (and maybe I will anyway), but this really would make a good (finite; conceptually simple) project for someone who wants to gain Python developer experience. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Useful thread project for 2.5?
[Phillip J. Eby] What would you suggest calling it? sys._current_frames(), returning a dictionary? I don't fight about names -- anything that doesn't make Guido puke works wink. I channel that sys._current_frames() would be fine. A dict mapping thread id to current thread frame would be lovely! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: Useful thread project for 2.5?
[Greg Ward] What would be *really* spiffy is to provide a way for externally-triggered thread dumps. This is one of my top two Java features [1]. The way this works in Java is a bit awkward -- kill -QUIT the Java process and it writes a traceback for every running thread to stdout -- but it works. Something similar ought to be possible for Python, although optional (because Python apps can handle signals themselves, unlike Java apps). It could be as simple as this: calling sys.enablethreaddump(signal=signal.SIGQUIT) from the program enables externally-triggered thread dumps via the specified signal. See the link in my original post to Florent's Zope deadlock debugger. Things like the above are easy enough to create _given_ a bit of C code in the core to build on. [Phillip J. Eby] Couldn't this just be done with traceback.print_stack(), given the _current_frames() facility? Right. About the only real problem with it is that the other threads can keep running while you're trying to print the stack dumps. I don't know that it matters. If you're debugging a deadlocked thread, its stack isn't going to change. If you're trying to find out where unexpected time is getting swallowed, statements in the offending loop(s) are still going to show up in the stack trace. I suppose you could set the check interval really high and then restore it afterwards as a sneaky way of creating a critical section. Unfortunately, there's no getcheckinterval(). sys.getcheckinterval() was added in Python 2.3. Which reminds me, btw, it would be nice while we're adding more execution control functions to have a way to get the current trace hook and profiling hook, not to mention ways to set them on non-current threads. You can do these things from C of course; I mean accessible as part of the Python API. Huh. It didn't remind me of those at all wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Can't build Zope on Windows w/ 2.4.1c1
I don't know how far I'll get with this. Using the current Zope-2_7-branch of the Zope module at cvs.zope.org:/cvs-repository, building Zope via python setup.py build_ext -i worked fine when I got up today, using the released Python 2.4. One of its tests fails, because of a Python bug that should be fixed in 2.4.1. So I wanted to test that. After uninstalling 2.4, then installing 2.4.1c1, then deleting Zope's lib\python\build directory, the attempt to build Zope works fine for quite a while (successfully compiles many chunks of Zope C code), but dies here: ... building 'ZODB.cPickleCache' extension C:\Program Files\Microsoft Visual Studio .NET 2003\Vc\bin\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\Code\Zope-2_7-branch\lib\Components\ExtensionClass\src -IC:\python24\include -IC:\python24\PC /TcZODB/cPickleCache.c /Fobuild\temp.win32-2.4 \Release\ZODB/cPickleCache.obj cPickleCache.c error: No such file or directory I don't know which file it's complaining about. In contrast, output I saved from building Zope with 2.4 this morning appears identical up to this point: ... building 'ZODB.cPickleCache' extension C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\Code\Zope-2_7-branch\lib\Components\ExtensionClass\src -IC:\python24\include -IC:\python24\PC /TcZODB/cPickleCache.c /Fobuild\temp.win32-2.4 \Release\ZODB/cPickleCache.obj cPickleCache.c but then, instead of an error, it goes on to link (and build more C stuff): C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\python24\libs /LIBPATH:C:\python24\PCBuild /EXPORT:initcPickleCache build\temp.win32-2.4\Release\ZODB/cPickleCache.obj /OUT:ZODB\cPickleCache.pyd /IMPLIB:build\temp.win32-2.4\Release\ZODB\cPickleCache.lib Creating library build\temp.win32-2.4\Release\ZODB\cPickleCache.lib and object build\temp.win32-2.4\Release\ZODB\cPickleCache.exp building 'ZODB.TimeStamp' extension Gets stranger: cPickleCache.c is really part of ZODB, and python setup.py build continues to work fine with 2.4.1c1 from a ZODB3 checkout (and using the same branch tag as was used for Zope). Anyone change anything here for 2.4.1 that rings a bell? Can anyone confirm that they can or can't build current Zope 2.7 on Windows with 2.4.1c1 too? If it's not unique to my box, is it unique to Windows (e.g., can anyone build current Zope on Linux with 2.4.1c1)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can't build Zope on Windows w/ 2.4.1c1
[Anthony Baxter] It works on Linux, with Zope 2.7.4. Thanks! Just as a note to others (I've mentioned this to Tim already) if you set an environment variable DISTUTILS_DEBUG before running a setup.py, you get very verbose information about what's going on, and, more importantly, full tracebacks rather than terse error messages. error: No such file or directory looks like a distutils error message. This helped a lot, although I'm even more confused now: building 'ZODB.cPickleCache' extension C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\bin\cl.exe /c /nologo /Ox /MD /W3 /GX /DNDEBUG -IC:\Code\Zope-2_7-branch\lib\Components\ExtensionClass\src -IC:\python24\include -IC:\python24\PC /TcZODB/cPickleCache.c /Fobuild\temp.win32-2.4\Release\ZODB/cPickleCache.obj cPickleCache.c error: No such file or directory Traceback (most recent call last): File C:\Code\Zope-2_7-branch\setup.py, line 1094, in ? distclass=ZopeDistribution, File C:\python24\lib\distutils\core.py, line 149, in setup dist.run_commands() File C:\python24\lib\distutils\dist.py, line 946, in run_commands self.run_command(cmd) File C:\python24\lib\distutils\dist.py, line 966, in run_command cmd_obj.run() File C:\python24\lib\distutils\command\build_ext.py, line 279, in run self.build_extensions() File C:\python24\lib\distutils\command\build_ext.py, line 405, in build_extensions self.build_extension(ext) File C:\python24\lib\distutils\command\build_ext.py, line 502, in build_extension target_lang=language) File C:\python24\lib\distutils\ccompiler.py, line 847, in link_shared_object extra_preargs, extra_postargs, build_temp, target_lang) File C:\python24\lib\distutils\msvccompiler.py, line 422, in link if not self.initialized: self.initialize() File C:\python24\lib\distutils\msvccompiler.py, line 239, in initialize os.environ['path'] = string.join(self.__paths, ';') File C:\python24\lib\os.py, line 419, in __setitem__ putenv(key, item) OSError: [Errno 2] No such file or directory LOL -- or something wink. Before going to sleep, Anthony suggested that Bug #1110478: Revert os.environ.update to do putenv again. might be relevant. HTF can we get a no such file or directly out of a putenv()?! Don't be shy. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can't build Zope on Windows w/ 2.4.1c1
This is going to need someone who understands distutils internals. The strings we end up passing to putenv() grow absurdly large, and sooner or later Windows gets very unhappy with them. os.py has a elif name in ('os2', 'nt'): # Where Env Var Names Must Be UPPERCASE class controlling introduction of a _Environ class. I changed its __setitem__ like so: def __setitem__(self, key, item): if key.upper() == PATH: # new line print len(item), len(item) # new line putenv(key, item) self.data[key.upper()] = item As setup.py build_ext -i goes on while compiling Zope, this is the output before putenv barfs: len(item) 1025 len(item) 1680 len(item) 2335 len(item) 2990 len(item) 3645 len(item) 4300 len(item) 4955 len(item) 5610 len(item) 6265 len(item) 6920 len(item) 7575 len(item) 8230 len(item) 8885 len(item) 9540 len(item) 10195 len(item) 10850 len(item) 11505 len(item) 12160 len(item) 12815 len(item) 13470 len(item) 14125 len(item) 14780 len(item) 15435 len(item) 16090 len(item) 16745 len(item) 17400 len(item) 18055 len(item) 18710 len(item) 19365 len(item) 20020 len(item) 20675 len(item) 21330 len(item) 21985 len(item) 22640 len(item) 23295 len(item) 23950 len(item) 24605 len(item) 25260 len(item) 25915 len(item) 26570 len(item) 27225 len(item) 27880 len(item) 28535 len(item) 29190 len(item) 29845 len(item) 30500 len(item) 31155 len(item) 31810 len(item) 32465 The PATH isn't gibberish at this point -- it just keeps adding the MSVC directories (like C:\\Program Files\\Microsoft Visual Studio .NET 2003\\Vc7\\bin ) again and again and again. I don't know what the environ limits are on various flavors of Windows; empirically, on my Win XP Pro SP2 box, the values have to be 32K or putenv() dies. But there's surely no need for distutils to make PATH grow without bound, so I think this is a distutils bug. A workaround for building Zope is easy but embarrassing wink: kill setup.py before it hits this error, then start it again. Lather, rinse, repeat. After a few iterations, everything builds fine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can't build Zope on Windows w/ 2.4.1c1
[ A.M. Kuchling] In distutils.msvccompiler: def __init__ (self, verbose=0, dry_run=0, force=0): ... self.initialized = False def compile(self, sources, output_dir=None, macros=None, include_dirs=None, debug=0, extra_preargs=None, extra_postargs=None, depends=None): if not self.initialized: self.initialize() ... def initialize(self): ... does not seem to set self.initialized to True! I think the fix is to add 'self.initialized = True' to the initialize() method, but can't test it (no Windows). This fix should also go into 2.4.1-final, I expect. Dunno, but sounds good. We certainly ought to fix it for 2.4.1 final, whatever the fix is. I opened a bug report: http://www.python.org/sf/1160802 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RELEASED Python 2.4.1, release candidate 1
[Martin v. Löwis] I'd like to encourage feedback on whether the Windows installer works for people. It replaces the VBScript part in the MSI package with native code, which ought to drop the dependency on VBScript, but might introduce new incompatibilities. Worked fine here. Did an all-default all users install, WinXP Pro SP2, from local disk, and under an account with Admin rights. I uninstalled 2.4 first. I suppose that's the least stressful set of choices I could possibly have made, but at least it confirms a happy baseline. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] distutils fix for building Zope against Python 2.4.1c1
[Trent Mick] Investigation has turned up that I cannot keep my Python trees straight. That patch *does* fix building PyWin32 against 2.4.1c1. Good! Please send a check for US$1000.00 to the PSF by Monday wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum()
FYI, there are a lot of ways to do accurate fp summation, but in general people worry about it too much (except for those who don't worry about it at all -- they're _really_ in trouble 0.5 wink). One clever way is to build on that whenever |x| and |y| are within a factor of 2 of each other, x+y is exact in 754 arithmetic. So you never lose information (not even one bit) when adding two floats with the same binary exponent. That leads directly to this kind of code: from math import frexp class Summer: def __init__(self): self.exp2sum = {} def add(self, x): while 1: exp = frexp(x)[1] if exp in self.exp2sum: x += self.exp2sum.pop(exp) # exact! else: break self.exp2sum[exp] = x # trivially exact def total(self): items = self.exp2sum.items() items.sort() return sum((x for dummy, x in items), 0.0) exp2sum maps a binary exponent to a float having that exponent. If you pass a sequence of fp numbers to .add(), then ignoring underflow/overflow endcases, the key invariant is that the exact (mathematical) sum of all the numbers passed to add() so far is equal to the exact (mathematical) sum of exp2sum.values(). While it's not obvious at first, the total number of additions performed inside add() is typically a bit _less_ than the total number of times add() is called. More importantly, len(exp2sum) can never be more than about 2K. The worst case for that is having one input with each possible binary exponent, like 2.**-1000 + 2.**-999 + ... + 2.**999 + 2.**1000. No inputs are like that in real life, and exp2sum typically has no more than a few dozen entries. total() then adds those, in order of increasing exponent == in order of increasing absolute value. This can lose information, but there is no bad case for it, in part because there are typically so few addends, and in part because that no two addends have the same binary exponent implies massive cancellation can never occur. Playing with this can show why most fp apps shouldn't worry most often. Example: Get a million floats of about the same magnitude: xs = [random.random() for dummy in xrange(100)] Sum them accurately: s = Summer() for x in xs: ... s.add(x) No information has been lost yet (if you could look at your FPU's inexact result flag, you'd see that it hasn't been set yet), and the million inputs have been squashed into just a few dozen buckets: len(s.exp2sum) 22 from pprint import pprint pprint(s.exp2sum) {-20: 8.8332388070710977e-007, -19: 1.4206079529399673e-006, -16: 1.0065260162672729e-005, -15: 2.4398649189794064e-005, -14: 5.3980784313178987e-005, -10: 0.00074737138777436485, -9: 0.0014605198999595448, -8: 0.003361820812962546, -7: 0.0063811680318408559, -5: 0.016214300821313588, -4: 0.044836286041944229, -2: 0.17325355843673518, -1: 0.46194788522906305, 3: 6.4590200674982423, 4: 11.684394209886134, 5: 24.715676913177944, 6: 49.056084672323166, 10: 767.69329043309051, 11: 1531.1560084859361, 13: 6155.484212371357, 17: 98286.760386143636, 19: 393290.34884990752} Add those 22, smallest to largest: s.total() 500124.06621686369 Add the originals, left to right: sum(xs) 500124.06621685845 So they're the same to about 14 significant decimal digits. No good if exact pennies are important, but far more precise than real-world measurements. How much does sorting first help? xs.sort() sum(xs) 500124.06621685764 Not much! It actually moved a bit in the wrong direction (which is unusual, but them's the breaks). Using decimal as a (much more expensive) sanity check: import decimal t = decimal.Decimal(0) for x in xs: ... t += decimal.Decimal(repr(x)) print t 500124.0662168636972127329455 Of course wink Summer's result is very close to that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum()
[Raymond Hettinger] Computing an error term can get the bit back and putting that term back in the input queue restores the overall sum. Right! Once the inputs are exhausted, the components of exp2sum should be exact. Ditto. I'll cover some subtleties below: from math import frexp from itertools import chain def summer2(iterable): exp2sum = {} queue = [] for x in chain(iterable, queue): mant, exp = frexp(x) while exp in exp2sum: y = exp2sum.pop(exp) z = x + y d = x - z + y # error term assert z + d == x + y Much more is true there, but hard to assert in Python: the mathematical sum z+d is exactly equal to the mathematical sum x+y there, and the floating-point |d| is less than 1 unit in the last place relative to the floating-point |z|. It would be clearer to name z as hi and d as lo. More, that's not _generally_ true: it relies on that x and y have the same binary exponent. For example, pick x=1 and y=1e100, and you end up with hi=1e100 and lo=0. The mathematical x+y isn't equal to the mathematical hi+lo then. It's a real weakness of Kahan summation that most spellings of it don't bother to account for this; it's sufficient to normalize first, swapping x and y if necessary so that abs(x) = abs(y) (although that isn't needed _here_ because we know they have the same binary exponent). There's another way to handle the general case that doesn't require test, branch, or abs(), but at the cost of several more addition/subtractions. if d: queue.append(d) If x and y have different signs, this part doesn't trigger. If all inputs are positive, then we expect it to trigger about half the time (the cases where exactly one of x's and y's least-significant bits is set). So the queue can be expected to grow to about half the length of the original iterable by the time the original iterable is exhausted. x = z mant, exp = frexp(x) exp2sum[exp] = x return sum(sorted(exp2sum.itervalues(), key=abs), 0.0) The implementation can be tweaked to consume the error term right away so the queue won't grow by more than few pending items. Theorem 10 in Shewchuk's Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates gives the obvious wink correct way to do that, although as a practical matter it greatly benifits from a bit more logic to eliminate zero entries as they're produced (Shewchuk doesn't because it's not his goal to add a bunch of same-precision floats). BTW, extracting binary exponents isn't necessary in his approach (largely specializations to perfect 754 arithmetic of Priest's earlier less-demanding methods). Also, the speed can be boosted by localizing frexp, exp2sum.pop, and queue.append. I'm very glad you quit while it was still interesting wink. People who are paranoid about fp sums should use something like this. Even pairing is prone to disaster, given sufficiently nasty input. For example: xs = [1, 1e100, 1, -1e100] * 1 sum(xs) # and the obvious pairing method gives the same 0.0 sum(sorted(xs)) # the result is nearly incomprehensible -8.0076811737544552e+087 sum(sorted(xs, reverse=True)) 8.0076811737544552e+087 summer2(xs) # exactly right 2.0 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: distutils fix for building Zope against Python 2.4.1c1
[Tim] Don't think it's needed on HEAD. As the checkin comment said: This doesn't appear to be an issue on the HEAD (MSVCCompiler initializes itself via __init__() on the HEAD). I verified by building Zope with unaltered HEAD too, and had no problem with that. [Martin] Are you sure your HEAD is current? Of course I was sure. I was also wrong about that, but I resent the implication that I wasn't sure wink. I'll port the fix sometime this weekend. Thanks! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Open issues for 2.4.1
[Anthony Baxter] So here's a list of open items I'm thinking about for the 2.4.1 release. [... sorry, but my editor automatically deletes all paragraphs mentioning problems with Unicode ...] - The unitest changesChanges to unitest to fix subclassing broke Zope's unittests. Should this change be reverted before 2.4.1, No. or was the Zope test suite doing something particularly dodgy? Yes, it was overriding a documented, public method, but with no awareness that it was overriding anything -- a subclass just happened to define a method with the same name as an advertised unittest base class method. I'm talking about - unittest.TestCase.run() and unittest.TestSuite.run() can now be successfully extended or overridden by subclasses. Formerly, the subclassed method would be ignored by the rest of the module. (Bug #1078905). Since there was no _intent_ in the zdaemon tests to override run(), it's just an accident that unittest used to ignore run() overrides. The fix in zdaemon source was s/run/_run/g. It shouldn't have used run to begin with. I've not looked too closely at the broken Zope code - can someone from ZC comment? Not officially wink. How likely is it that other programs will also have been broken by this? Approximately equal to the odds that someone else defined a method named run() in a TestCase or TestSuite subclass without realizing they were actually overriding an advertised base class method (but one that didn't actually work as advertised, so there was no visible consequence before). ZC has a relatively huge number of test suites, and it should be noted that only the zdaemon suite was affected by this. Since the zdaemon tests don't run at all on Windows, I never noticed this; if I had, I would have changed the zdaemon test source as a matter of course (i.e., the thought Python bug wouldn't have crossed my mind in this case). At this point, I'm leaning (very slightly) towards the feeling that this should probably be backed out before 2.4.1, mostly because it seems to me that this is an incompatibility, rather than a pure bug fix. It looked like pilot error on zdaemon's side. I'm now pretty sure we need a 2.4.1rc2 for this week, and a 2.4.1 final the week after. There's been a few too many changes for my tastes to say that going straight to a 2.4.1 final is a prudent course of action. OK by me! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rationale for sum()'s design?
[Guido van Rossum] Um, Python doesn't provide a lot of special support for numbers apart from literals -- sum() should support everything that supports the + operator, just like min() and max() support everything that supports comparison, etc. The discussion in the patch that introduced it may be illuminating: http://www.python.org/sf/724936 From your (Guido's) first comment there, it seems clear that sum() was originally intended only for numbers. Then it got generalized. Then sequences of strings specifically were disallowed. Then it was suggested that mention of a sequence of lists or tuples be removed from the docs, and that datetime.timedelta() be used in examples where 0 didn't make sense as the identity. Then Alex changed it to disallow any sequence of sequences. Then you suggested either specifically disallowing only sequences of lists, tuples or strings (but allowing all other sequence types as elements), _or_ going back to disallowing only sequences of strings. Alex took the latter suggestion, and that's where it ended. The closest thing to a design rationale I found is Guido's Pronouncement here, and I think it covers most issues raised this time around: http://mail.python.org/pipermail/python-dev/2003-April/034853.html The optional argument was my fault. The rest was Guido's wink: If we add an optional argument for Tim's use case, it could be used in two different ways: (1) only when the sequence is empty, (2) always used as a starting point. IMO (2) is more useful and more consistent. Nobody opposed that. I remember silently agreeing at the time, just because #2 had precedent in other languages, and seemed easiest to explain: sum(seq, start=0) same-as start + seq[0] + seq[1] + ... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rationale for sum()'s design?
[Alex Martelli] I'm reasonably often using sum on lists of datetime.timedelta instances, durations, which ARE summable just like numbers even though they aren't numbers. I believe everything else for which I've used sum in production code DOES come under the general concept of numbers, in particular X+0 == X. Unfortunately, this equation doesn't hold when X is a tiwmedelta, as X+0 raises an exception then. I count timedeltas as numbers too -- or at least I did when sum() was being designed, cuz I asked for the optional start= argument, and precisely in order to sum things like this (timedelta was indeed the driving example at the time). I can't say it bothers me to specify an appropriate identity element when 0 is inappropriate. If it switched to ignoring the start= argument unless the sequence was empty, that would be OK too, although I think sum(seq, start=0) same-as start + seq[0] + seq[1] + ... will always be easiest to explain, so all else being equal I prefer current behavior. The number of times I've passed a sequence to sum() with elements that were lists, tuples, or any other kind of object with a concatenate meaning for __add__ is zero so far. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with definition of _POSIX_C_SOURCE
[Jack Jansen] On a platform I won't mention here I'm running into problems compiling Python, because of it defining _POSIX_C_SOURCE. ... Does anyone know what the real meaning of this define is? LOL. Here's the Official Story: http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_02.html Look near the top, under The _POSIX_C_SOURCE Feature Test Macro. This will tell you: When an application includes a header described by IEEE Std 1003.1-2001, and when this feature test macro is defined to have the value 200112L: yadda yadda yadda yadda yadda yadda yadda yadda Then again, every journey of a million miles begins with 200112L small steps ... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thread semantics for file objects
[Jeremy Hylton] ... Universal newline reads and get_line() both lock the stream if the platform supports it. So I expect that they are atomic on those platforms. Well, certainly not get_line(). That locks and unlocks the stream _inside_ an enclosing for-loop. Looks quite possible for different threads to read different parts of the same line if multiple threads are trying to do get_line() simultaneously. It releases the GIL inside the for-loop too, so other threads _can_ sneak in. We put a lot of work into speeding those getc()-in-a-loop functions. There was undocumented agreement at the time that they should be thread-safe in this sense: provided the platform C stdio wasn't thread-braindead, then if you had N threads all simultaneously reading a file object containing B bytes, while nobody wrote to that file object, then the total number of bytes seen by all N threads would sum to B at the time they all saw EOF. This was a much stronger guarantee than Perl provided at the time (and, for all I know, still provides), and we (at least I) wrote little test programs at the time demonstrating that the total number of bytes Perl saw in this case was unpredictable, while Python's did sum to B. Of course Perl didn't document any of this either, and it Pythonland was clearly specific to the horrid tricks in CPython's fileobject.c. But it certainly seems safe to conclude this is a quality of implementation issue. Or a sheer pigheadness-of-implementor issue wink. Otherwise, why bother with the flockfile() at all, right? Or is there some correctness issue I'm not seeing that requires the locking for some basic safety in the implementation. There are correctness issues, but we still ignore them; locking relieves, but doesn't solve, them. For example, C doesn't (and POSIX doesn't either!) define what happens if you mix reads with writes on a file opened for update unless a file-positioning operation (like seek) intervenes, and that's pretty easy for threads to run afoul of. Python does nothing to stop you from trying, and behavior if you do is truly all over the map across boxes. IIRC, one of the multi-threaded test programs I mentioned above provoked ugly death in the bowels of MS's I/O libraries when I threw an undisciplined writer thread into the mix too. This was reported to MS, and their response was so don't that -- it's undefined. Locking the stream at least cuts down the chance of that happening, although that's not the primary reason for it. Heck, we still have a years-open critical bug against segfaults when one thread tries to close a file object while another threading is reading from it, right? And even using a lock is stupid. ZODB's FileStorage is bristling with locks protecting multi-threaded access to file objects, therefore that can't be stupid. QED Using a lock seemed like a good idea there and still seems like a good idea now :-). Damn straight, and we're certain it has nothing to do with those large runs of NUL bytes that sometime overwrite peoples' critical data for no reason at all wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [AST] Procedure for AST Branch patches
[Brett C.] Make sure AST is used in the subject line; e.g., [AST] at the beginning. Unfortunately the AST group is only available for patches; not listed for bug reports (don't know why; can this be fixed?). Your wish is my command: there's an AST group in Python's bug tracker now. FYI, each tracker has a distinct set of metadata choices, and nothing shows up in any of 'em by magic. Other than that, just assign it to me since I will most likely be doing AST work in the near future. Unfortunately, auto-assign keys off Category instead of Group metadata. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] longobject.c ob_size
[Michael Hudson] Asking mostly for curiousity, how hard would it be to have longs store their sign bit somewhere less aggravating? Depends on where that is. It seems to me that the top bit of ob_digit[0] is always 0, for example, Yes, the top bit of ob_digit[i], for all relevant i, is 0 on all platforms now. and I'm sure this would result no less convolution in longobject.c it'd be considerably more localized convolution. I'd much rather give struct _longobject a distinct sign member (say, 0 == zero, -1 = non-zero negative, 1 == non-zero positive). That would simplify code. It would cost no extra bytes for some longs, and 8 extra bytes for others (since obmalloc rounds up to a multiple of 8); I don't care about that (e.g., I never use millions of longs simultaneously, but often use a few dozen very big longs simultaneously; the memory difference is in the noise then). Note that longintrepr.h isn't included by Python.h. Only longobject.h is, and longobject.h doesn't reveal the internal structure of longs. IOW, changing the internal layout of longs shouldn't even hurt binary compatibility. The ob_size member of PyObject_VAR_HEAD would also be redeclared as size_t in an ideal world. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: [Python-checkins] python/dist/src/Modules mathmodule.c, 2.74, 2.75
[EMAIL PROTECTED] Modified Files: mathmodule.c Log Message: Add a comment explaining the import of longintrepr.h. Index: mathmodule.c ... #include Python.h -#include longintrepr.h +#include longintrepr.h // just for SHIFT The intent is fine, but please use a standard C (not C++) comment. That is, /*...*/, not //. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] threading (GilState) question
[Michael Hudson] ... Point the first is that I really think this is a bug in the GilState APIs: the readline API isn't inherently multi-threaded and so it would be insane to call PyEval_InitThreads() in initreadline, yet it has to cope with being called in a multithreaded situation. If you can't use the GilState APIs in this situation, what are they for? That's explained in the PEP -- of course wink: http://www.python.org/peps/pep-0311.html Under Limitations and Exclusions it specifically disowns responsibility for worrying about whether Py_Initialize() and PyEval_InitThreads() have been called: This API will not perform automatic initialization of Python, or initialize Python for multi-threaded operation. Extension authors must continue to call Py_Initialize(), and for multi-threaded applications, PyEval_InitThreads(). The reason for this is that the first thread to call PyEval_InitThreads() is nominated as the main thread by Python, and so forcing the extension author to specify the main thread (by forcing her to make this first call) removes ambiguity. As Py_Initialize() must be called before PyEval_InitThreads(), and as both of these functions currently support being called multiple times, the burden this places on extension authors is considered reasonable. That doesn't mean there isn't a clever way to get the same effect anyway, but I don't have time to think about it, and reassigned the bug report to Mark (who may or may not have time). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] marshal / unmarshal
[Scott David Daniels] What should marshal / unmarshal do with floating point NaNs (the case we are worrying about is Infinity) ? The current behavior is not perfect. All Python behavior in the presence of a NaN, infinity, or signed zero is a platform-dependent accident. This is because C89 has no such concepts, and Python is written to the C89 standard. It's not easy to fix across all platforms (because there is no portable way to do so in standard C), although it may be reasonably easy to fix if all anyone cares about is gcc and MSVC (every platform C compiler has its own set of gimmicks for dealing with these things). If marshal could reliably detect a NaN, then of course unmarshal should reliably reproduce the NaN -- provided the platform on which it's unpacked supports NaNs. Should loads raise an exception? Never for a quiet NaN, unless the platform doesn't support NaNs. It's harder to know what to with a signaling NaN, because Python doesn't have any of 754's trap-enable or exception status flags either (the new ``decimal`` module does, but none of that is integrated with the _rest_ of Python yet). Should note that what the fp literal 1e1 does across boxes is also an accident -- Python defers to the platform C libraries for string-float conversions. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
marshal shouldn't be representing doubles as decimal strings to begin with. All code for (de)serialing C doubles should go thru _PyFloat_Pack8() and _PyFloat_Unpack8(). cPickle (proto = 1) and struct (std mode) already do; marshal is the oddball. But as the docs (floatobject.h) for these say: ... * Bug: What this does is undefined if x is a NaN or infinity. * Bug: -0.0 and +0.0 produce the same string. */ PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le); PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le); ... * Bug: What this does is undefined if the string represents a NaN or * infinity. */ PyAPI_FUNC(double) _PyFloat_Unpack4(const unsigned char *p, int le); PyAPI_FUNC(double) _PyFloat_Unpack8(const unsigned char *p, int le); ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[mwh] OTOH, the implementation has this comment: /* * _PyFloat_{Pack,Unpack}{4,8}. See floatobject.h. * * TODO: On platforms that use the standard IEEE-754 single and double * formats natively, these routines could simply copy the bytes. */ Doing that would fix these problems, surely?[1] The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. Copying bytes works perfectly for all other cases (signed zeroes, non-zero finites, infinities), because their representations are wholly defined, although it's possible that a subnormal on one box will be treated like a zero (with the same sign) on a partially-conforming box. [1] I'm slighyly worried about oddball systems that do insane things with the FPU by default -- but don't think the mooted change would make things any worse. Sorry, don't know what that means. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. ... [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? No idea here. The existing pack routines strive to do a good job of _creating_ an IEEE-754-format representation regardless of platform representation. I assume that code would still be present, so oddball platforms would be left no worse off than they are now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[Tim] The 754 standard doesn't say anything about how the difference between signaling and quiet NaNs is represented. So it's possible that a qNaN on one box would look like an sNaN on a different box, and vice versa. But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. [mwh] OK. Do you have any intuition as to whether 754 implementations actually *do* differ on this point? Not anymore -- hasn't been part of my job, or a hobby, for over a decade. There were differences a decade+ ago. All NaNs have all exponent bits set, and at least one mantissa bit set, and every bit pattern of that form represents a NaN. That's all the standard says. The most popular way to distinguish quiet from signaling NaNs keyed off the most-significant mantissa bit: set for a qNaN, clear for an sNaN. It's possible that all 754 HW does that now. There's at least still that Pentium hardware adds a third not-a-number possibility: in addition to 754's quiet and signaling NaNs, it also has indeterminate values. Here w/ native Windows Python 2.4 on a Pentium: inf = 1e300 * 1e300 inf - inf # indeterminate -1.#IND - _ # but the negation of IND is a quiet NaN 1.#QNAN Do the same thing under Cygwin Python on the same box and it prints NaN twice. Do people care about this? I don't know. It seems unlikely -- in effect, IND just gives a special string name to a single one of the many bit patterns that represent a quiet NaN. OTOH, Pentium hardware still preserves this distinction, and MS library docs do too. IND isn't part of the 754 standard (although, IIRC, it was part of a pre-standard draft, which Intel implemented and is now stuck with). Copying bytes works perfectly for all other cases (signed zeroes, non-zero finites, infinities), because their representations are wholly defined, although it's possible that a subnormal on one box will be treated like a zero (with the same sign) on a partially-conforming box. I'd find struggling to care about that pretty hard. Me too. The question, of course, is how to tell. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. Module initialization code is harder to screw up for that reason (the code is in an obvious place then, self-contained, and doesn't require any relevant knowledge of any platform porter unless/until it breaks). their bits. That's enough to settle whether a 754 format is in use, and, if it is, whether it's big-endian or little-endian. Do you have a pointer to code that does this? No. Pemberton's enquire.c contains enough code to do it. Given how few distinct architectures still exist, it's probably enough to store just double x = 1.5 and stare at it. [2] Exaggeration, I realize -- but how many non 754 systems are out there? How many will see Python 2.5? No idea here. The existing pack routines strive to do a good job of _creating_ an IEEE-754-format representation regardless of platform representation. I assume that code would still be present, so oddball platforms would be left no worse off than they are now. Well, yes, given the above. The text this footnote was attached to was asking if just assuming 754 float formats would inconvenience anyone. I think I'm still missing your intent here. If you're asking whether Python can blindly assume that 745 is in use, I'd say that's undesirable but defensible if necessary. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
... [mwh] OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... This doesn't seem very bad. Not bad at all: But since most people run with all FPU traps disabled, and Python doesn't expose a way to read the FPU status flags, they couldn't tell the difference. Store a few small doubles at module initialization time and stare at ./configure time, surely? Unsure. Not all Python platforms _have_ ./configure time. But they all have pyconfig.h. Yes, and then a platform porter has to understand what to #define/#undefine, and why. People doing cross-compilation may have an especially confusing time of it. Module initialization code just works, so I certainly understand why it doesn't appeal to the Unix frame of mind wink. Module initialization code is harder to screw up for that reason (the code is in an obvious place then, self-contained, and doesn't require any relevant knowledge of any platform porter unless/until it breaks). Well, sure, but false negatives here are not a big deal here. Sorry, unsure that false negative means here. ... Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); Right, it's that easy -- at least under MSVC and gcc. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
[Michael Hudson] I've just submitted http://python.org/sf/1180995 which adds format codes for binary marshalling of floats if version 1, but it doesn't quite have the effect I expected (see below): inf = 1e308*1e308 nan = inf/inf marshal.dumps(nan, 2) Traceback (most recent call last): File stdin, line 1, in ? ValueError: unmarshallable object I don't understand. Does binary marshalling _not_ mean just copying the bytes on a 754 platform? If so, that won't work. I pointed out the relevant comments before: /* The pack routines write 4 or 8 bytes, starting at p. ... * Bug: What this does is undefined if x is a NaN or infinity. * Bug: -0.0 and +0.0 produce the same string. */ PyAPI_FUNC(int) _PyFloat_Pack4(double x, unsigned char *p, int le); PyAPI_FUNC(int) _PyFloat_Pack8(double x, unsigned char *p, int le); frexp(nan, e), it turns out, returns nan, This is an undefined case in C89 (all 754 special values are). which results in this (to be expected if you read _PyFloat_Pack8 and know that I'm using a new-ish GCC -- it might be different for MSVC 6). Also (this is the same thing, really): Right. So is pickling with proto = 1. Changing the pack/unpack routines to copy bytes instead (when possible) fixes all of these things at one stroke, on boxes where it applies. struct.pack('d', inf) Traceback (most recent call last): File stdin, line 1, in ? SystemError: frexp() result out of range Although I was a little surprised by this: struct.pack('d', inf) '\x7f\xf0\x00\x00\x00\x00\x00\x00' (this is a big-endian system). Again, reading the source explains the behaviour. OK, so the worst that could happen here is that moving marshal data from one box to another could turn one sort of NaN into another? Right. Assuming source and destination boxes both use 754 format, and the implementation adjusts endianess if necessary. Well, I was assuming marshal would do floats little-endian-wise, as it does for integers. Then on a big-endian 754 system, loads() will have to reverse the bytes in the little-endian marshal bytestring, and dumps() likewise. That's all if necessary meant -- sometimes cast + memcpy isn't enough, and regardless of which direction marshal decides to use. Heh. I have a vague half-memory of _some_ box that stored the two 4-byte words in an IEEE double in one order, but the bytes within each word in the opposite order. It's always something ... I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. Copying internal native bytes across boxes is plain ugly -- can't get more brittle than that. In this case it looks like a good tradeoff, though. ... Well, they can always not #define HAVE_IEEE_DOUBLES and not suffer all that much (this is what I meant by false negatives below). ... It just strikes as silly to test at runtime sometime that is so obviously not going to change between invocations. But it's not a big deal either way. It isn't to me either. It just strikes me as silly to give porters another thing to wonder about and screw up when it's possible to solve it completely with a few measly runtime cycles wink. Something along these lines: double x = 1.5; is_big_endian_ieee_double = sizeof(double) == 8 \ memcmp((char*)x, \077\370\000\000\000\000\000\000, 8); Right, it's that easy Cool. -- at least under MSVC and gcc. Huh? Now it's my turn to be confused (for starters, under MSVC ieee doubles really can be assumed...). So you have no argument with the at least under MSVC part wink. There's nothing to worry about here -- I was just tweaking. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: marshal / unmarshal
... [mwh] I recall stories of machines that stored the bytes of long in some crazy order like that. I think Python would already be broken on such a system, but, also, don't care. [Tim] Python does very little that depends on internal native byte order, and C hides it in the absence of casting abuse. [mwh] This surely does: PyObject * PyLong_FromLongLong(PY_LONG_LONG ival) { PY_LONG_LONG bytes = ival; int one = 1; return _PyLong_FromByteArray( (unsigned char *)bytes, SIZEOF_LONG_LONG, IS_LITTLE_ENDIAN, 1); } Yes, that's casting abuse'. Python does very little of that. If it becomes necessary, it's straightforward but long-winded to rewrite the above in wholly portable C (peel the bytes out of ival, least-signficant first, via shifting and masking 8 times; ival 0xff is the least-significant byte regardless of memory storage order; etc). BTW, the IS_LITTLE_ENDIAN macro also relies on casting abuse, and more deeply than does the visible cast there. It occurs that in the IEEE case, special values can be detected with reliablity -- by picking the exponent field out by force Right, that works for NaNs and infinities; signed zeroes are a bit trickier to detect. -- and a warning emitted or exception raised. Good idea? Hard to say, to me. It's not possible to _create_ a NaN or infinity from finite operands in 754 without signaling some exceptional condition. Once you have one, though, there's generally nothing exceptional about _using_ it. Sometimes there is, like +Inf - +Inf or Inf / Inf, but not generally. Using a quiet NaN never signals; using a signaling NaN almost always signals. So packing a nan or inf shouldn't complain. On a 754 box, unpacking one shouldn't complain either. Unpacking a nan or inf on a non-754 box probably should complain, since there's in general nothing it can be unpacked _to_ that makes any sense (errors should never pass silently). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Newish test failures
Seeing three seemingly related test failures today, on CVS HEAD: test_csv test test_csv failed -- errors occurred; run in verbose mode for details test_descr test test_descr crashed -- exceptions.AttributeError: attribute '__dict__' of 'type' objects is not writable test_file test test_file crashed -- exceptions.AttributeError: attribute 'closed' of 'file' objects is not writable 3 tests failed: test_csv test_descr test_file Drilling into test_csv: ERROR: test_reader_attrs (test.test_csv.Test_Csv) -- Traceback (most recent call last): File C:\Code\python\lib\test\test_csv.py, line 62, in test_reader_attrs self._test_default_attrs(csv.reader, []) File C:\Code\python\lib\test\test_csv.py, line 58, in _test_default_attrs self.assertRaises(TypeError, delattr, obj.dialect, 'quoting') File C:\Code\python\lib\unittest.py, line 320, in failUnlessRaises callableObj(*args, **kwargs) AttributeError: attribute 'quoting' of '_csv.Dialect' objects is not writable == ERROR: test_writer_attrs (test.test_csv.Test_Csv) -- Traceback (most recent call last): File C:\Code\python\lib\test\test_csv.py, line 65, in test_writer_attrs self._test_default_attrs(csv.writer, StringIO()) File C:\Code\python\lib\test\test_csv.py, line 58, in _test_default_attrs self.assertRaises(TypeError, delattr, obj.dialect, 'quoting') File C:\Code\python\lib\unittest.py, line 320, in failUnlessRaises callableObj(*args, **kwargs) AttributeError: attribute 'quoting' of '_csv.Dialect' objects is not writable ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 -- concept clarification
[Tim] Because Queue does use condvars now instead of plain locks, I wouldn't approve of any gimmick purporting to hide the acquire/release's in put() or get(): that those are visible is necessary to seeing that the _condvar_ protocol is being followed (must acquire() before wait(); must be acquire()'ed during notify(); no path should leave the condvar acquire()d 'for a long time' before a wait() or release()). [Guido] So you think that this would be obscure? A generic condition variable use could look like this: block locking(self.condvar): while not self.items: self.condvar.wait() self.process(self.items) self.items = [] instead of this: self.condvar.acquire() try: while not self.items: self.condvar.wait() self.process(self.items) self.items = [] finally: self.condvar.release() I find that the block locking version looks just fine; it makes the scope of the condition variable quite clear despite not having any explicit acquire() or release() calls (there are some abstracted away in the wait() call too!). Actually typing it all out like that makes it hard to dislike wink. Yup, that reads fine to me too. I don't think anyone has mentioned this yet, so I will: library writers using Decimal (or more generally HW 754 gimmicks) have a need to fiddle lots of thread-local state (numeric context), and must restore it no matter how the routine exits. Like boost precision to twice the user's value over the next 12 computations, then restore, and no matter what happens here, restore the incoming value of the overflow-happened flag. It's just another instance of temporarily taking over a shared resource, but I think it's worth mentioning that there are a lot of things like that in the world, and to which decorators don't really sanely apply. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Guido] I'm +1 on accepting this now -- anybody against? I'm curious to know if you (Guido) remember why you removed this feature in Python 0.9.6? From the HISTORY file: New features in 0.9.6: - stricter try stmt syntax: cannot mix except and finally clauses on 1 try IIRC (and I may well not), half of people guessed wrong about whether an exception raised in an except: suite would or would not skip execution of the same-level finally: suite. try: 1/0 except DivisionByZero: 2/0 finally: print yes or no? The complementary question is whether an exception in the finally: suite will be handled by the same-level except: suites. There are obvious answers to both, of course. The question is whether they're the _same_ obvious answers across responders 0.7 wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Unifying try-except and try-finally
[Shane Holloway] And per the PEP, I think the explaining that:: try: A except: B else: C finally: D is *exactly* equivalent to:: try: try: A except: B else: C finally: D Resolved all the questions about control flow for me. Well, assuming that implementation makes the explanation truth. ;) Yup! It's not unreasonable to abbreviate it, but the second form is obvious on the face of it, and can already be written. I'm neutral on adding the slightly muddier shortcut. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 340 - Remaining issues - keyword
[Guido] ... I wonder how many folks call their action methods do() though. A little Google(tm)-ing suggests it's not all that common, although it would break Zope on NetBSD: http://www.zope.org/Members/tino/ZopeNetBSD I can live with that wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tidier Exceptions
[Guido, on string exceptions] ... Last I looked Zope 2 still depended on them (especially in the bowels of ZODB); maybe Tim Peters knows if that's still the case. Certainly none of that in ZODB, or in ZRS. Definitely some in Zope 2.6: http://mail.zope.org/pipermail/zope-tests/2005-May/002110.html I don't think there are any string exceptions in Zope 2.7, Zope 2.8, or Zope 3. Development on Zope 2.6 stopped about a year ago, so the 2.6 story will never change; by the same token, no version of Python after 2.3.5 will ever be approved for use with 2.6 anyway. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] python/nondist/peps pep-0343.txt, 1.11, 1.12
[Raymond Hettinger] ... One more change: The final return +s should be unindented. It should be at the same level as the do with_extra_precision(). The purpose of the +s is to force the result to be rounded back to the *original* precision. This nuance is likely to be the bane of folks who shift back and forth between different levels of precision. Well, a typical user will never change precision most of the time. Of the remaining uses, most will set precision once at the start of the program, and never change it again. Library authors may change precision frequently, but they should be experts. The following example shows the kind of oddity that can arise when working with quantities that have not been rounded to the current precision: from decimal import getcontext, Decimal as D getcontext().prec = 3 D('3.104') + D('2.104') Decimal(5.21) D('3.104') + D('0.000') + D('2.104') Decimal(5.20) I think it shows more why it was a mistake for the decimal constructor to extend the standard (the string-decimal operation in the standard respects context settings; the results differ here because D(whatever) ignores context settings; having a common operation ignore context is ugly and error-prone). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Combining the best of PEP 288 and PEP 325: generator exceptions and cleanup
[Guido] ... I think in the past I've unsuccessfully tried to argue that if a cycle contains exactly one object with a Python-invoking finalizer, that finalizer could be invoked before breaking the cycle. I still think that's a sensible proposal, and generators may be the use case to finally implement it. You have argued it, and I've agreed with it. The primary hangup is that there's currently no code capable of doing it. gc currently determines the set of objects that must be part of cyclic trash, or reachable only from cyclic trash, but has no relevant knowledge beyond that. For example, it doesn't know the difference between an object that's in a trash cycle, and an object that's not in a trash cycle but is reachable only from trash cycles. In fact, it doesn't know anything about the cycle structure. That would require some sort of new SCC (strongly connected component) analysis. The graph derived from an arbitrary object graph by considering each SCC to be a node is necessarily a DAG (contains no cycles), and the general way to approach what you want here is to clear trash in a topological sort of the SCC DAG: so long as an SCC contains only one object that may execute Python code, it's safe to run that object's cleanup code first (for a meaning of safe that may not always coincide with explainable or predictable 0.9 wink). gc would probably need to give up after the first such thingie is run, if any SCCs are reachable from the SCC X containing that thingie (gc can no longer be sure that successor SCCs _are_ still trash: they too are reachable from X, so may have been resurrected by the Python code X ran). There's currently no forcing at all of the order in which tp_clear gets called, and currently no analysis done sufficient to support forcing a relevant ordering. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Combining the best of PEP 288 and PEP 325: generator exceptions and cleanup
[Phillip J. Eby] ... However, Tim's new post brings up a different issue: if the collector can't tell the difference between a cycle participant and an object that's only reachable from a cycle, then the mere existence of a generator __del__ will prevent the cycle collection of the entire traceback/frame system that includes a generator-iterator reference anywhere! And that's a pretty serious problem. It's not that simple wink. If an object with a __del__ is not part of a cycle, but is reachable only from trash cycles, that __del__ does not inhibit garbage collection. Like: A-B - C - D - E where C, D and E have __del__, but A and B don't, and all are trash. Relatively early on, gc moves C, D, and E into a special finalizers list, and doesn't look at this list again until near the end. Then A.tp_clear() and B.tp_clear() are called in some order. As a *side effect* of calling B.tp_clear(), C's refcount falls to 0, and Python's normal refcount-based reclamation (probably) recovers all of C, D and E, and runs their __del__ methods. Note that refcount-based reclamation necessarily follows a DAG order: E is still intact when D.__del__ is called, and likewise D is still intact when C.__del__ is called. It's possible that C.__del__ will resurrect D and/or E, and that D.__del__ will resurrect E. In such cases, D and/or E's refcounts don't fall to 0, and their __del__ methods won't be called then. Cyclic gc doesn't force any of that, though -- it's all a side effect of the clear() in gcmodule.c's: if ((clear = op-ob_type-tp_clear) != NULL) { Py_INCREF(op); clear(op); Py_DECREF(op); In turn, one of A and B get reclaimed as a side effect of the Py_DECREF there -- it's one of the delights of gcmodule.c that if you don't know the trick, you can stare at it for hours and never discover where exactly it is anything gets released 0.9 wink. In fact, it doesn't release anything directly -- all it really does now is break reference cycles, so that Py_DECREF can do its end-of-life thing. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Combining the best of PEP 288 and PEP 325: generator exceptions and cleanup
[Phillip J. Eby] Now you've shaken my faith in Uncle Timmy. :) Now, now, a mere technical matter is no cause for soul-damning heresy! Seriously, he did *say*: For example, it doesn't know the difference between an object that's in a trash cycle, and an object that's not in a trash cycle but is reachable only from trash cycles. So now I wonder what he *did* mean. What I said, of course ;-) I hope my later email clarified it. gc knows which trash objects have __del__ methods, and which don't. That's all it needs to know so that a __del__ method on an object that's not in a trash cycle but is reachable only from trash cycles will get reclaimed (provided that no __del__ method on a predecessor object that's not in a trash cycle but is reachable only from trash cycles resurrects it). gc doesn't know whether the set of objects it _intends_ to call tp_clear on are or aren't in cycles, but all objects directly in __del__ free cycles are included in that set. That's enough to ensure that trash hanging off of them sees its refcounts fall to 0 as gc applies tp_clear to the objects in that set. Note that this set can mutate as gc goes along: calling tp_clear on one object can (also as a side effect of refcounts falling to 0) remove any number of other objects from that set (that's why I said intends above: there's no guarantee that gc will end up calling tp_clear on any object other than the first one in the set, where the first is utterly arbitrary now). If an object in a trash cycle has a __del__ method, this is why the cycle won't be reclaimed: all trash objects with __del__ methods, and everything transitively reachable from them, are moved to the finalizers list early on. If that happens to include a trash cycle C, then all of C ends up in the finalizers list, and no amount of tp_clear'ing on the objects that remain can cause the refcount on any object in C to fall to 0. gc has no direct knowledge of cycles in this case either. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adventures with Decimal
[Raymond Hettinger] For brevity, the above example used the context free constructor, but the point was to show the consequence of a precision change. Yes, I understood your point. I was making a different point: changing precision isn't needed _at all_ to get surprises from a constructor that ignores context. Your example happened to change precision, but that wasn't essential to getting surprised by feeding strings to a context-ignoring Decimal constructor. In effect, this creates the opportunity for everyone to get suprised by something only experts should need to deal with. There seems to be an unspoken wow that's cool! kind of belief that because Python's Decimal representation is _potentially_ unbounded, the constructor should build an object big enough to hold any argument exactly (up to the limit of available memory). And that would be appropriate for, say, an unbounded rational type -- and is appropriate for Python's unbounded integers. But Decimal is a floating type with fixed (albeit user-adjustable) precision, and ignoring that mixes arithmetic models in a fundamentally confusing way. I would have no objection to a named method that builds a big as needed to hold the input exactly Decimal object, but it shouldn't be the behavior of the everyone-uses-it-constructor. It's not an oversight that the IBM standard defines no operations that ignore context (and note that string-float is a standard operation): it's trying to provide a consistent arithmetic, all the way from input to output. Part of consistency is applying the rules everywhere, in the absence of killer-strong reasons to ignore them. Back to your point, maybe you'd be happier if a named (say) apply_context() method were added? I agree unary plus is a funny-looking way to spell it (although that's just another instance of applying the same rules to all operations). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adventures with Decimal
Sorry, I simply can't make more time for this. Shotgun mode: [Raymond] I have no such thoughts but do strongly prefer the current design. How can you strongly prefer it? You asked me whether I typed floats with more than 28 significant digits. Not usually wink. Do you? If you don't either, how can you strongly prefer a change that makes no difference to what you do? ... The overall design of the module and the spec is to apply context to the results of operations, not their inputs. But string-float is an _operation_ in the spec, as it has been since 1985 in IEEE-754 too. The float you get is the result of that operation, and is consistent with normal numeric practice going back to the first time Fortran grew a distinction between double and single precision. There too the common practice was to write all literals as double-precision, and leave it to the compiler to round off excess bits if the assignment target was of single precision. That made it easy to change working precision via fiddling a single implicit (a kind of type declaration) line. The same kind of thing would be pleasantly applicable for decimal too -- if the constructor followed the rules. In particular, the spec recognizes that contexts can change | and rather than specifying automatic or implicit context application to all existing values, it provides the unary plus operation so that such an application is explicit. The use of extra digits in a calculation is not invisible as the calculation will signal Rounded and Inexact (if non-zero digits are thrown away). Doesn't change that the standard rigorously specifies how strings are to be converted to decimal floats, or that our constructor implementation doesn't do that. One of the original motivating examples was schoolbook arithmetic where the input string precision is incorporated into the calculation. Sorry, doesn't ring a bell to me. Whose example was this? IMO, input truncation/rounding is inconsistent with that motivation. Try keying more digits into your hand calculator than it can hold 0.5 wink. Likewise, input rounding runs contrary to the basic goal of eliminating representation error. It's no surprise that an exact value containing more digits than current precision gets rounded. What _is_ surprising is that the decimal constructor doesn't follow that rule, instead making up its own rule. It's an ugly inconsistency at best. With respect to integration with the rest of Python (everything beyond that spec but needed to work with it), I suspect that altering the Decimal constructor is fraught with issues such as the string-to-decimal-to-string roundtrip becoming context dependent. Nobody can have a reasonable expectation that string - float - string is an identity for any fixed-precision type across all strings. That's just unrealistic. You can expect string - float - string to be an identity if the string carries no more digits than current precision. That's how a bounded type works. Trying to pretend it's not bounded in this one case is a conceptual mess. I haven't thought it through yet but suspect that it does not bode well for repr(), pickling, shelving, etc. The spirit of the standard is always to deliver the best possible approximation consistent with current context. Unpickling and unshelving should play that game too. repr() has a special desire for round-trip fidelity. Likewise, I suspect that traps await multi-threaded or multi- context apps that need to share data. Like what? Thread-local context precision is a reality here, going far beyond just string-float. Also, adding another step to the constructor is not going to help the already disasterous performance. (1) I haven't found it to be a disaster. (2) Over the long term, the truly speedy implementations of this standard will be limited to a fixed set of relatively small precisions (relative to, say, 100, not to 28 wink). In that world it would be unboundedly more expensive to require the constructor to save every bit of every input: rounding string-float is a necessity for speedy operation over the long term. I appreciate efforts to make the module as idiot-proof as possible. That's not my interest here. My interest is in a consistent, std-conforming arithmetic, and all fp standards since IEEE-754 recognized that string-float is an operation much like every other fp operation. Consistency helps by reducing complexity. Most users will never bump into this, and experts have a hard enough job without gratuitous deviations from a well-defined spec. What's the _use case_ for carrying an unbounded amount of information into a decimal instance? It's going to get lost upon the first operation anyway. However, that is a pipe dream. By adopting and exposing the full standard instead of the simpler X3.274 subset, using the module is a non-trivial exercise and, even for experts, is a complete PITA. Rigorous numeric programming is a
Re: [Python-Dev] [Python-checkins] python/nondist/peps pep-0343.txt, 1.11, 1.12
[Greg Ewing] I don't see it's because of that. Even if D(whatever) didn't ignore the context settings, you'd get the same oddity if the numbers came from somewhere else with a different precision. Most users don't change context precision, and in that case there is no operation defined in the standard that can _create_ a decimal with different precision. Python's Decimal constructor, however, can (Python's Decimal constructor performs an operation that's not in the standard -- it's a Python-unique extension to the standard). I'm very uncomfortable about the whole idea of a context-dependent precision. It just seems to be asking for trouble. If you're running on a Pentium box, you're using context-dependent precision a few million times per second. Most users will be as blissfully unaware of decimal's context precsion as you are of the Pentium FPU's context precision. Most features in fp standards are there for the benefit of experts. You're not required to change context; those who need such features need them desperately, and don't care whether you think they should wink. An alternative is a God-awful API that passes a context object explicitly to every operation. You can, e.g., kiss infix + goodbye then. Some implementations of the standard do exactly that. You might want to read the standard before getting carried off by gut reactions: http://www2.hursley.ibm.com/decimal/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adventures with Decimal
[Michael Chermside] Tim, I find Raymond's arguments to be much more persuasive. (And that's even BEFORE I read his 11-point missive.) I understood the concept that *operations* are context- dependent, but decimal *objects* are not, and thus it made sense to me that *constructors* were not context-dependent. On the other hand, I am NOT a floating-point expert. Can you educate me some? Sorry, I can't make more time for this now. The short course is that a module purporting to implement an external standard should not deviate from that standard without very good reasons, and should make an effort to hide whatever deviations it thinks it needs to indulge (e.g., make them harder to spell). This standard provides 100% portable (across HW, across OSes, across programming languages) decimal arithmetic, but of course that's only across standard-conforming implementations. That the decimal constructor here deviates from the standard appears to be just an historical accident (despite Raymond's current indefatigable rationalizations wink). Other important implementations of the standard didn't make this mistake; for example, Java's BigDecimal|(java.lang.String) constructor follows the rules here: http://www2.hursley.ibm.com/decimalj/deccons.html Does it really need to be argued interminably that deviating from a standard is a Big Deal? Users pay for that eventually, not implementors. Even if a standard is wrong (and leaving aside that I believe this standard asks for the right behavior here), users benefit from cross-implementation predictability a lot more than they can benefit from a specific implementation's non-standard idiosyncracies. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adventures with Decimal
[Raymond Hettinger] The word deviate inaccurately suggests that we do not have a compliant method which, of course, we do. There are two methods, one context aware and the other context free. The proposal is to change the behavior of the context free version, treat it as a bug, and alter it in the middle of a major release. I didn't suggest changing this for 2.4.2. Although, now that you mention it ... wink. The sole argument resembles bible thumping. I'm sorry, but if you mentally reduced everything I've written about this to the sole argument, rational discussion has become impossible here. In the meantime, I've asked Mike Cowlishaw what his intent was, and what the standard may eventually say. I didn't express a preference to him. He said he'll think about it and try to get back to me by Sunday. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adventures with Decimal
[Guido] It looks like if you pass in a context, the Decimal constructor still ignores that context: import decimal as d d.getcontext().prec = 4 d.Decimal(1.234567890123456789012345678901234567890123456789, d.getcontext()) Decimal(1.234567890123456789012345678901234567890123456789) I think this is contrary to what some here have claimed (that you could pass an explicit context to cause it to round according to the context's precision). I think Michael Chermside said that's how a particular Java implementation works. Python's Decimal constructor accepts a context argument, but the only use made of it is to possibly signal a ConversionSyntax condition. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Vestigial code in threadmodule?
[A.M. Kuchling] Looking at bug #1209880, the following function from threadmodule.c is referenced. I think the args==NULL case, which can return None instead of a Boolean value, can never be reached because PyArg_ParseTuple() will fail if args==NULL. It would assert-fail in a debug build. In a release build the most likely outcome would be a segfault (NULL-pointer dereference in the expansion of vgetargs1's PyTuple_Check(args)). Before ripping the args==NULL code out, I wanted to be sure my analysis is correct; is there some subtlety here I'm missing that makes args==NULL possible? Rip it out; blame me wink. --amk static PyObject * lock_PyThread_acquire_lock(lockobject *self, PyObject *args) Noe that this is a file-local function. The only references are here: static PyMethodDef lock_methods[] = { {acquire_lock, (PyCFunction)lock_PyThread_acquire_lock, METH_VARARGS, acquire_doc}, {acquire, (PyCFunction)lock_PyThread_acquire_lock, METH_VARARGS, acquire_doc}, METH_VARARGS always passes a tuple (possibly empty). These are very old functions, so I bet they used to use METH_OLDARGS (implied by absence at the time) ... yup, METH_VARARGS was introduced here in rev 2.48, and unintentionally changed the return contract of this function. So that was a backward incompatibility introduced in Python 2.3a1. Since nobody complained then or since, I vote to keep the new return contract and fiddle the docs to match it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Propose to reject PEP 336 -- Make None Callable
[Raymond Hettinger] After nine months, no support has grown beyond the original poster. Never will, either -- even Roman numeral literals are more Pythonic than this one. More Pythonic: make integers callable: i(arglist) returns the i'th argument. So, e.g., people who find it inconvenient to index a list like this: x[i] could index it like this instead: i(*x) Punchline: I didn't make this up -- that's how integers work in Icon! Kinda. y := 2 y(x, y, z) := 3 also works to bind `y` to 3. Python is falling _way_ behind wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Propose rejection of PEP 303 -- Extend divmod() for Multiple Divisors
About PEP 303, I use divmod for lots (and lots) of things, but I've got no real use for an extended divmod() either. -1: it would be low-use, confusing clutter. [Barry] Interesting. Just yesterday I wrote a simple stopwatch-like timer script and I found that I needed three divmod calls to convert from seconds into a datetime.time object. You don't need any divmods for that ... ... Actually, no, because datetime.time(seconds=50227) throws an exception. That's right: the time class models a time of day, not seconds from Barry's implied midnight epoch (or something like that). What you wanted was a timedelta instead. Converting that to a time is then simplicity itself wink: print (datetime(1, 1, 1) + timedelta(seconds=50227)).time() 13:57:07 You have to go thru a datetime object first because time objects don't support arithmetic either. That isn't all bad. By going thru a datetime first, it's clear what happens if the number of seconds you feed in exceeds a day's worth. You can check for that or ignore it then, depending on what your app wants (it may or may not be an error, depending on the app). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] is type a usable feature?
[Paolino [EMAIL PROTECTED]] Hello developers,I noticed my application was growing strangely while I was using type, then I tried this: while True: type('A',(),{}) and saw memory filling up.Is there a clean solution to that? I see it as a bug in python engeneering,that is why I wrote to you. Python bugs should be reported on SourceForge: http://sourceforge.net/tracker/?group_id=5470atid=105470 Please specify the Python version and OS. I do not see memory growth running the above under Python 2.3.5 or 2.4.1 on Windows, so I don't have any evidence of a bug here in the Pythons I usually use. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] refcounting vs PyModule_AddObject
[Michael Hudson] I've been looking at this area partly to try and understand this bug: [ 1163563 ] Sub threads execute in restricted mode but I'm not sure the whole idea of multiple interpreters isn't inherently doomed :-/ [Martin v. Löwis] That's what Tim asserts, saying that people who want to use the feature should fix it themselves. I haven't said they're doomed, although I have said that people who want to use them are on their own. I think the latter is simply an obvious truth, since (a) multiple interpreters have been entirely unexercised by the Python test suite (if it's not tested, it's broken rules); (b) Python developers generally pay no attention to them; and (c), as the teensy bit of docs for them imply, they're an 80% solution, but to a problem that's probably more sensitive than most to glitches in the other 20%. I've also said that Mark's thread state PEP explicitly disavowed responsibility for working nicely with multiple interpreters. I said that only because that's what his PEP said wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] python/dist/src/Lib Cookie.py, 1.17, 1.18
[EMAIL PROTECTED] Update of /cvsroot/python/python/dist/src/Lib In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4891/Lib Modified Files: Cookie.py Log Message: bug [ 1108948 ] Cookie.py produces invalid code Index: Cookie.py === RCS file: /cvsroot/python/python/dist/src/Lib/Cookie.py,v retrieving revision 1.17 retrieving revision 1.18 diff -u -d -r1.17 -r1.18 --- Cookie.py 20 Oct 2003 14:01:49 - 1.17 +++ Cookie.py 26 Jun 2005 21:02:49 - 1.18 @@ -470,9 +470,9 @@ def js_output(self, attrs=None): # Print javascript return -SCRIPT LANGUAGE=JavaScript +script type=text/javascript !-- begin hiding -document.cookie = \%s\ +document.cookie = \%s\; // end hiding -- /script % ( self.OutputString(attrs), ) I assume this accounts for the current failure of test_cookie: test_cookie test test_cookie produced unexpected output: ** *** mismatch between line 19 of expected output and line 19 of actual output: - SCRIPT LANGUAGE=JavaScript + script type=text/javascript *** mismatch between line 21 of expected output and line 21 of actual output: - document.cookie = Customer=WILE_E_COYOTE; Path=/acme; Version=1; + document.cookie = Customer=WILE_E_COYOTE; Path=/acme; Version=1;; ? + *** mismatch between line 26 of expected output and line 26 of actual output: - SCRIPT LANGUAGE=JavaScript + script type=text/javascript *** mismatch between line 28 of expected output and line 28 of actual output: - document.cookie = Customer=WILE_E_COYOTE; Path=/acme; + document.cookie = Customer=WILE_E_COYOTE; Path=/acme;; ? + ** ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] List copy and clear (was Re: Inconsistent API forsets.Set and build-in set)
[Tim Peters] Or my personal favorite, while mylist: del mylist[::2] Then the original index positions with the most consecutive trailing 1 bits survive the longest, which is important to avoid ZODB cache bugs wink. [Christos Georgiou] This is a joke, hopefully, and in that case, I fell for it. If not, please provide a url with related discussion (for educational purposes :) Fell for what? It's certainly true that the code snippet allows the original index positions with the most consecutive trailing 1 bits to survive the longest (on the first iteration, all even index positions (no trailing 1 bits) are deleted, and you don't really need a URL to figure out what happens on the i'th iteration). The idea that this is helpful in avoiding anything's cache bugs is purely wink-worthy, though. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Chaining try statements: eltry?
[Guido, on {for,while}/else] ... The question remains whether Python would be easier to learn without them. And if so, the question would remain whether that's offset by their utility for experienced developers. All hard to assess impartially! That's what I'm here for. I like loop else clauses, but have to admit that (a) I rarely use them; (b) most of the time I use them, my control flow is on the way to becoming so convoluted that I'm going to rewrite the whole function soon anyway; and, (c) I've often misread code that uses them, mentally attaching the else to a nearby if instead. I also suspect that if they weren't in the language already, a PEP to introduce them would fail, because still_looking = True some loop: if found it: still_looking = False break if still_looking: # what would have been in the else clause is clear and easy to write without it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Chaining try statements: eltry?
[Guido] OTOH I don't particularly like code that requires flag variables; Me neither; that's indeed why this one isn't a slam dunk. they often make me squirm because the same condition (flag) is tested multiple times where it could be tested just once if more sophisticated flow control (e.g. an else clause :) was available. What burns me (far) more often in practice is simulating a multi-level `break` or `continue`. Then multiple flag variables get introduced, set, and tested multiple times at multiple logical indent levels too. That can also be viewed as stemming from a lack of more sophisticated flow control. One-loop found-it-or-didn't kinds of flag variables have spatial locality that makes them (IME) quite easy to live with, in comparison. How would a PEP to *remove* this feature fare today? Easy: it would be rejected, but with a note that it should be reconsidered for Python 3000. Unhelpfully, your-opposite-in-oh-so-many-wayswink-ly y'rs - tim ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Chaining try statements: eltry?
[Jeremy Hylton] ... PS Every time I switch between Python and C, I get confused about elif and else if. Mostly goes to show that you don't use Perl much ;-) Of course, in C99, #define elif else if is part of stdlib.h. Or maybe it isn't, and it just should have been? One of those -- or close enough. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Linux Python linking with G++?
[Michael Hudson] --with-fpectl, for example. Does anyone lurking here actually use that, know what it does and require the functionality? Inquiring minds want to know. I know what it intends to do: fpectlmodule.c intends to enable the HW FPU divide-by-0, overflow, and invalid operation traps; if any of those traps trigger, raise the C-level SIGFPE signal; and convert SIGFPE to a Python-level FloatingPointError exception. The comments in pyfpe.h explain this best. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] should doc string content == documentation content?
[Skip] There's a new bug report on SF (#1243553) complaining (that's probably not the right word) that the documentation for cgi.escape available from pydoc isn't as detailed as that in the full documentation. Is there any desire to make the runtime documentation available via pydoc or help() as detailed as the full documentation? I'm sure there is wink, but via a different route: tools to extract text from the full documentation, not to burden docstrings with an impossible task. Channeling Guido, docstrings are best when they have a quick reference card feel, more memory aid than textbook. That doesn't mean it wouldn't also be nice to have the textbook online from pydoc/help too, it means that manuals and docstrings serve different purposes. ... While I can fix the isolated case of cgi.escape fairly easily, I'm not inclined to. (I will gladly do it if the sentiment is that picking off such low-hanging fruit is worthwhile.) What do other people think? The cgi.escape docstring _should_ admit to the optional boolan `quote` argument. I'm not sure why it uses the highfalutin' SGML entities either, when the full docs are content to use the plain-folks HTML-safe (if anything, I'd expect the full docs to be anal and the docstring to be friendly). But that's criticism of the docstring _as_ a docstring, not criticizing the docstring for, e.g., not mentioning xml.sax.saxutils.quoteattr() too. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] should doc string content == documentation content?
[Tim Lesher] While I agree that docstrings shouldn't be a deep copy of _Python in a Nutshell_, there are definitely some areas of the standard library that could use some help. threading comes to mind immediately. Sure! The way to cure that one is to write better docstrings for threading -- go for it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] zlib 1.2.3 is just out
[Scott David Daniels] I'd guess this belongs in 2.5, with a possible retrofit for 2.4. [Raymond Hettinger] +1 on backporting, but that is up to Anthony. [Martin v. L?wis wrote] Correct me if I'm wrong - but there isn't much porting to this. AFAICT, this is only relevant for the Windows build (i.e. which version is used in the binaries). For the source distribution, there should be no change (except for PCbuild/readme.txt, which should reflect the version that is used in the Windows build). FWIW, this currently talks about 1.2.1. [Trent Mick] Here is a patch to do this (attached) that works on the trunk and against the Python-2.4.1.tgz source tarball. Shall I check this into the HEAD and release24-maint? Definitely on HEAD, almost certainly on 24 maint. The slight uncertainty wrt the latter is due to the slim possibility that they also made this version of zlib incompatible in some way. I doubt that, but I haven't tried it. BTW, the NEWS files should add a blurb saying Python moved to a new zlib, under a Windows section. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
[Martin v. Löwis] I'd like to see the Python source be stored in Subversion instead of CVS, and on python.org instead of sf.net. To facilitate discussion, I have drafted a PEP describing the rationale for doing so, and the technical procedure to be performed. This is for discussion on python-dev and eventual BDFL pronouncement; if you see a reason not to make such a change, or if you would prefer a different procedure, please speak up. Encouragement and support is welcome, as well :-) Encouragement and support on SVN, undecided on moving to python.org (don't know when SF intends to support SVN, don't have a feel for the state of-- or propsects for ongoing --python.org volunteerism). ... The conversion should be done using cvs2svn utility, available e.g. in the cvs2svn Debian package. The command for converting the Python repository is cvs2svn -q --encoding=latin1 --force-branch=cnri-16-start --force-branch=descr-branch --force-branch=release152p1-patches --force-tag=r16b1 --fs-type=fsfs -s py.svn.new python/python The command to convert the distutils repository is cvs2svn -q --encoding=latin1 --fs-type=fsfs -s dist.svn.new python/distutils Sample results of this conversion are available at http://www.dcl.hpi.uni-potsdam.de/python/ http://www.dcl.hpi.uni-potsdam.de/distutils/ I'm sending this to Jim Fulton because he did the conversion of Zope Corp's code base to SVN. Unfortunately, Jim will soon be out of touch for several weeks. Jim, do you have time to summarize the high bits of the problems you hit? IIRC, you didn't find any conversion script at the time capable of doing the whole job as-is. If that's wrong, it would be good to know that too. Other than that, I'd just like to see an explicit mention in the PEP of a plan to preserve the pre-conversion CVS tarball forever. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
[Jeff Rush] The conversion script isn't perfect and does fail on complex CVS arrangements or where there is extensive history to migate. However it appears above that Martin has already tried the script out, with success. I'd still like to hear from Jim, as I don't believe all serious problems were identified by eyeball at once. That said, the Python project has made relatively very little use of complex (for CVS) concepts like branching, but in Zopeland it wasn't unusual, over long stretches, for someone to make a new branch every day. Ah, before I forget, single repository has worked very well for Zope (which includes top-level Zope2, Zope3, ZODB, ZConfig, zdaemon, ... projects): http://svn.zope.org/ Long URLs don't really get in the way in practice (rarely a need to type one after initial checkout; even svn switch is usually just a tail-end cmdline edit starting from a copy+paste of svn info output). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
[Tim] Ah, before I forget, single repository has worked very well for Zope (which includes top-level Zope2, Zope3, ZODB, ZConfig, zdaemon, ... projects): http://svn.zope.org/ Long URLs don't really get in the way in practice (rarely a need to type one after initial checkout; even svn switch is usually just a tail-end cmdline edit starting from a copy+paste of svn info output). [Barry] It depends. In my use of svn, I do a lot of cross-branch merging and repo-side tagging. Yup, me too -- between the two of us, we don't have enough fingers to count how many trunks, branches, and tags of ZODB and Zope I have to fiddle with. Those are done with urls and in those cases, long urls can suck. They're all still copy, paste, tail-edit for me, and-- indeed! --having them all in the same repository is what makes just-the-tail editing possible. Merges I do from the cmdline, but repo-side tagging I do with the TortoiseSVN GUI, and the latter gives easy-to-copy/paste/edit URL fields. So switch to Windows for that part ;-) But we may not do a ton of that with the Python project, and besides it might not be important enough to split the directories. Ya, in Python we make a branch about once per release, + once per 5 years when Jeremy underestimates how long it will take to finish a crusade wink. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] math.fabs redundant?
[Skip] Why does math have an fabs function? Both it and the abs builtin function wind up calling fabs() for floats. abs() is faster to boot. Nothing deep -- the math module supplies everything in C89's standard libm (+ a few extensions), fabs() is a std C89 libm function. There isn't a clear (to me) reason why one would be faster than the other; sounds accidental; math.fabs() could certainly be made faster (as currently implemented (via math_1), it endures a pile of general-purpose try to guess whether libm should have set errno boilerplate that's wasted (there are no domain or range errors possible for fabs())). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FW: PEP 8: exception style
[AMK] PEP 8 doesn't express any preference between the two forms of raise statements: raise ValueError, 'blah' raise ValueError(blah) I like the second form better, because if the exception arguments are long or include string formatting, you don't need to use line continuation characters because of the containing parens. Grepping through the library code, the first form is in the majority, used roughly 60% of the time. Should PEP 8 take a position on this? If yes, which one? [Raymond Hettinger] I we had to pick one, I would also choose the second form. But why bother inflicting our preference on others, both forms are readable so we won't gain anything by dictating a style. Ongoing cruft reduction -- TOOWTDI. The first form was necessary at Python's start because exceptions were strings, and strings aren't callable, and there needed to be _some_ way to spell and here's the detail associated with the exception. raise grew special syntax to support that need. In a Python without string exceptions, that syntax isn't needed, and becomes (over time) an increasingly obscure way to invoke an ordinary constructor -- ValueError(blah) does exactly the same thing in a raise statement as it does in any other context, and transforming `ValueError, 'blah'` into the former becomes a wart unique to raise statements. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Migrating the Python CVS to Subversion
[Trent Mick] ... There are other little things, like not being able to trim the check-in filelist when editing the check-in message. For example, say you have 10 files checked out scattered around the Python source tree and you want to check 9 of those in. This seems dubious, since you're not checking in the state you actually have locally, and you were careful to run the full Python test suite with your full local state ;-) Currently with svn you have to manually specify those 9 to be sure to not include the remaining one. With p4 you just say to check-in the whole tree and then remove that one from the list give you in your editor with entering the check-in message. Not that big of a deal. As a purely theoretical exercise wink, the last time I faced that under SVN, I opened the single file I didn't want to check-in in my editor, did svn revert on it from the cmdline, checked in the whole tree, and then hit the editor's save button. This doesn't scale well to skipping 25 of 50, but it's effective enough for 1 or 2. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] __traceback__ and reference cycles
[Armin Rigo] There are various proposals to add an attribute on exception instances to store the traceback (see PEP 344). A detail not discussed, which I thought of historical interest only, is that today's exceptions try very hard to avoid reference cycles, in particular the cycle 'frame - local variable - traceback object - frame' which was important for pre-GC versions of Python. A clause 'except Exception, e' would not create a local reference to the traceback, only to the exception instance. If the latter grows a __traceback__ attribute, it is no longer true, and every such except clause typically creates a cycle. Of course, we don't care, we have a GC -- do we? Well, there are cases where we do: see the attached program... In my opinion it should be considered a bug of today's Python that this program leaks memory very fast and takes longer and longer to run each loop (each loop takes half a second longer than the previous one!). (I don't know how this bug could be fixed, though.) Spoiling the fun of figuring out what is going on, the reason is that 'e_tb' creates a reference cycle involving the frame of __del__, which keeps a reference to 'self' alive. Python thinks 'self' was resurrected. The next time the GC runs, the cycle disappears, and the refcount of 'self' drops to zero again, calling __del__ again -- which gets resurrected again by a new cycle. Etc... Note that no cycle actually contains 'self'; they just point to 'self'. In summary, no X instance gets ever freed, but they all have their destructors called over and over again. Attaching a __traceback__ will only make this bug show up more often, as the 'except Exception, e' line in a __del__() method would be enough to trigger it. Not sure what to do about it. I just thought I should share these thoughts (I stumbled over almost this problem in PyPy). I can't think of a Python feature with a higher aggregate braincell_burned / benefit ratio than __del__ methods. If P3K retains them-- or maybe even before --we should consider taking the Java dodge on this one. That is, decree that henceforth a __del__ method will get invoked by magic at most once on any given object O, no matter how often O is resurrected. It's been mentioned before, but it's at least theoretically backward-incompatible, so it's scary. I can guarantee I don't have any code that would care, including all the ZODB code I watch over these days. For ZODB it's especially easy to be sure of this: the only __del__ method in the whole thing appears in the test suite, verifying that ZODB's object cache no longer gets into an infinite loop when a user-defined persistent object has a brain-dead __del__ method that reloads self from the database. (Interestingly enough, if Python guaranteed to call __del__ at most once, the infinite loop in ZODB's object cache never would have appeared in this case.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] __traceback__ and reference cycles
[Tim Peters] If P3K retains them [__del__]-- or maybe even before --we should consider taking the Java dodge on this one. That is, decree that henceforth a __del__ method will get invoked by magic at most once on any given object O, no matter how often O is resurrected. [Phillip J. Eby] How does that help? You have to dig into Armin's example (or read his explanation): every time __del__ is called on one of his X objects, it creates a cycle by binding sys.exec_info()[2] to the local vrbl `e_tb`. `self` is reachable from that cycle, so self's refcount does not fall to 0 when __del__ returns. The object is resurrected. When cyclic gc next runs, it determines that the cycle is trash, and runs around decref'ing the objects in the cycle. That eventually makes the refcount on the X object fall to 0 again too, but then its __del__ method also runs again, and creates an isomorphic cycle, resurrecting `self` again. Etc. Armin didn't point this out explicitly, but it's important to realize that gc.garbage remains empty the entire time you let his program run. The object with the __del__ method isn't _in_ a cycle here, it's hanging _off_ a cycle, which __del__ keeps recreating. Cyclic gc isn't inhibited by a __del__ on an object hanging off a trash cycle (but not in a trash cycle itself), but in this case it's ineffective anyway. If __del__ were invoked only the first time cyclic gc ran, the original cycle would go away during the next cyclic gc run, and a new cycle would not take its place. Doesn't it mean that we'll have to have some way of keeping track of which items' __del__ methods were called? Yes, by hook or by crook; and yup too, that may be unattractive. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] __traceback__ and reference cycles
[Brett Cannon] Wasn't there talk of getting rid of __del__ a little while ago and instead use weakrefs to functions to handle cleaning up? There was from me, yes, with an eye toward P3K. Is that still feasible? It never was, really. The combination of finalizers, cycles and resurrection is a freakin' mess, even in theory. The way things are right now, Python's weakref gc endcase behavior is even more mystically implementation-driven than its __del__ gc endcase behavior, and nobody has had time to try to dream up a cleaner approach. And if so, would this alleviate the problem? Absolutely wink. The underlying reason for optimism is that weakrefs in Python are designed to, at worst, let *other* objects learn that a given object has died, via a callback function. The weakly referenced object itself is not passed to the callback, and the presumption is that the weakly referenced object is unreachable trash at the time the callback is invoked. IOW, resurrection was obviously impossible, making endcase life very much simpler. That paragraph is from Modules/gc_weakref.txt, and you can read there all about why optimism hasn't work yet ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Sourceforge CVS down?
[Neil Schemenauer[ I've been getting: ssh: connect to host cvs.sourceforge.net port 22: Connection refused for the past few hours. Their Site News doesn't say anything about downtime. A cvs update doesn't work for me either now. I did finish one sometime before noon (EDT) today, though. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 347: Migration to Subversion
[Martin v. Löwis] I have placed a new version of the PEP on http://www.python.org/peps/pep-0347.html ... +1 from me. But, I don't think my vote should count much, and (sorry) Guido's even less: what do the people who frequently check in want? That means people like you (Martin), Michael, Raymond, Walter, Fred. ... plus the release manager(s). BTW, a stumbling block in Zope's conversion to SVN was that the conversion script initially never set svn:eol-style on any file. This caused weeks of problems, as people on Windows got Linux line ends, and people checking in from Windows forced Windows line ends on Linuxheads (CVS defaults to assuming files are text; SVN binary). The peculiar workaround at Zope is that we're all encouraged to add something like this to our SVN config file: [auto-props] # Setting eol-style to native on all files is a trick: if svn # believes a new file is binary, it won't honor the eol-style # auto-prop. However, svn considers the request to set eol-style # to be an error then, and if adding multiple files with one # svn add cmd, svn will stop adding files after the first # such error. A future release of svn will probably consider # this to be a warning instead (and continue adding files). * = svn:eol-style=native It would be best if svn:eol-style were set to native during initial conversion from CVS, on all files not marked binary in CVS. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 347: Migration to Subversion
[Michael Hudson] I suppose another question is: when? Between 2.4.2 and 2.5a1 seems like a good opportunity. I guess the biggest job is collection of keys and associated admin? [Martin v. Löwis] I would agree. However, there still is the debate of hosting the repository elsehwere. Some people (Anthony, Guido, Tim) would prefer to pay for it, instead of hosting it on svn.python.org. Not this Tim. I _asked_ whether we had sufficient volunteer resource to host it on python.org, because I didn't know. Barry has since made sufficiently reassuring gurgles on that point, in particular that ongoing maintenance (after initial conversion) for filesystem-flavor SVN is likely in-the-noise level work. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com