Re: [Python-Dev] Problems with the Python Memory Manager
Travis Oliphant wrote: > In the long term, what is the status of plans to re-work the Python > Memory manager to free memory that it acquires (or improve the detection > of already freed memory locations). The Python memory manager does reuse memory that has been deallocated earlier. There are patches "floating around" that makes it return unused memory to the system (which it currently doesn't). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Travis Oliphant wrote: > As verified by removing usage of the Python PyObject_MALLOC function, it > was the Python memory manager that was performing poorly. Even though > the array-scalar objects were deleted, the memory manager would not > re-use their memory for later object creation. Instead, the memory > manager kept allocating new arenas to cover the load (when it should > have been able to re-use the old memory that had been freed by the > deleted objects--- again, I don't know enough about the memory manager > to say why this happened). One way (I think the only way) this could happen if: - the objects being allocated are all smaller than 256 bytes - when allocating new objects, the requested size was different from any other size previously deallocated. So if you first allocate 1,000,000 objects of size 200, and then release them, and then allocate 1,000,000 objects of size 208, the memory is not reused. If the objects are all of same size, or all larger than 256 bytes, this effect does not occur. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Travis Oliphant wrote: > So, I now believe that his code (plus the array scalar extension type) > was actually exposing a real bug in the memory manager itself. In > theory, the Python memory manager should have been able to re-use the > memory for the array-scalar instances because they are always the same > size. In practice, the memory was apparently not being re-used but > instead new blocks were being allocated to handle the load. That is really very hard to believe. Most people on this list would probably agree that obmalloc certain *will* reuse deallocated memory if the next request is for the very same size (number of bytes) that the previously-release object had. > His code is quite complicated and it is difficult to replicate the > problem. That the code is complex would not so much be a problem: we often analyse complex code here. It is a problem that the code is not available, and it would be a problem if the problem was not reproducable even if you had the code (i.e. if the problem would sometimes occur, but not the next day when you ran it again). So if you can, please post the code somewhere, and add a bugreport on sf.net/projects/python. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Martin v. Löwis wrote: > One way (I think the only way) this could happen if: > - the objects being allocated are all smaller than 256 bytes > - when allocating new objects, the requested size was different >from any other size previously deallocated. > > So if you first allocate 1,000,000 objects of size 200, and then > release them, and then allocate 1,000,000 objects of size 208, > the memory is not reused. > > If the objects are all of same size, or all larger than 256 bytes, > this effect does not occur. but the allocator should be able to move empty pools between size classes via the freepools list, right ? or am I missing something ? maybe what's happening here is more like So if you first allocate 1,000,000 objects of size 200, and then release most of them, and then allocate 1,000,000 objects of size 208, all memory might not be reused. ? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Martin v. Löwis wrote: > That the code is complex would not so much be a problem: we often > analyse complex code here. It is a problem that the code is not > available, and it would be a problem if the problem was not > reproducable even if you had the code (i.e. if the problem would > sometimes occur, but not the next day when you ran it again). You can get the version of scipy_core just before the fix that Travis applied: svn co -r 1488 http://svn.scipy.org/svn/scipy_core/trunk The fix: http://projects.scipy.org/scipy/scipy_core/changeset/1489 http://projects.scipy.org/scipy/scipy_core/changeset/1490 Here's some code that eats up memory with rev1488, but not with the HEAD: """ import scipy a = scipy.arange(10) for i in xrange(1000): x = a[5] """ -- Robert Kern [EMAIL PROTECTED] "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] urlparse brokenness
On Tue, 2005-11-22 at 23:04 -0600, Paul Jimenez wrote: > It is my assertion that urlparse is currently broken. Specifically, I > think that urlparse breaks an abstraction boundary with ill effect. > > In writing a mailclient, I wished to allow my users to specify their > imap server as a url, such as 'imap://user:[EMAIL PROTECTED]:port/'. Which > worked fine. I then thought that the natural extension to support FWIW, I have a small addition related to this that I think would be handy to add to the urlparse module. It is a pair of functions "netlocparse()" and "netlocunparse()" that is for parsing and unparsing "user:[EMAIL PROTECTED]:port" netloc's. Feel free to use/add/ignore it... http://minkirri.apana.org.au/~abo/projects/osVFS/netlocparse.py -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Hi, On Thu, Nov 24, 2005 at 01:59:57AM -0800, Robert Kern wrote: > You can get the version of scipy_core just before the fix that Travis > applied: Now we can start debugging :-) > http://projects.scipy.org/scipy/scipy_core/changeset/1490 This changeset alone fixes the small example you provided. However, compiling python "--without-pymalloc" doesn't fix it, so we can't blame the memory allocator. That's all I can say; I am rather clueless as to how the above patch manages to make any difference even without pymalloc. A bientot, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Hi, Ok, here is the reason for the leak... There is in scipy a type called 'int32_arrtype' which inherits from both another scipy type called 'signedinteger_arrtype', and from 'int'. Obscure! This is not 100% officially allowed: you are inheriting from two C types. You're living dangerously! Now in this case it mostly works as expected, because the parent scipy type has no field at all, so it's mostly like inheriting from both 'object' and 'int' -- which is allowed, or would be if the bases were written in the opposite order. But still, something confuses the fragile logic of typeobject.c. (I'll leave this bit to scipy people to debug :-) The net result is that unless you force your own tp_free as in revision 1490, the type 'int32_arrtype' has tp_free set to int_free(), which is the normal tp_free of 'int' objects. This causes all deallocated int32_arrtype instances to be added to the CPython free list of integers instead of being freed! A bientot, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 302, PEP 338 and imp.getloader (was Re: a Python interface for the AST (WAS: DRAFT: python-dev...)
Phillip J. Eby wrote: > This isn't hard to implement per se; setuptools for example has a > 'get_importer' function, and going from importer to loader is simple: Thanks, I think I'll definitely be able to build something out of that. > So with the above function you could do something like: > > def get_loader(fullname, path): > for path_item in path: > try: > loader = get_importer(path_item).find_module(fullname) > if loader is not None: > return loader > except ImportError: > continue > else: > return None > > in order to implement the rest. I think sys.meta_path needs to figure into that before digging through sys.path, but otherwise the concept seems basically correct. [NickC] >> ** I'm open to suggestions on how to deal with argv[0] and __file__. They >> should be set to whatever __file__ would be set to by the module >> loader, but >> the Importer Protocol in PEP 302 doesn't seem to expose that >> information. The >> current proposal is a compromise that matches the existing behaviour >> of -m >> (which supports scripts like regrtest.py) while still giving a meaningful >> value for scripts which are not part of the normal filesystem. [PJE] > Ugh. Those are tricky, no question. I can think of several simple > answers for each, all of which are wrong in some way. :) Indeed. I tried turning to "exec co in d" and "execfile(name, d)" for guidance, and didn't find any real help there. The only thing they automatically add to the supplied dictionary is __builtins__. The consequence is that any code executed using "exec" or "execfile" sees its name as being "__builtin__" because the lookup for '__name__' falls back to the builtin namespace. Further, "__file__" and "__loader__" won't be set at all when using these functions, which may be something of a surprise for some modules (to say the least). My current thinking is to actually try to distance the runpy module from "exec" and "execfile" significantly more than I'd originally intended. That way, I can explicitly focus on making it look like the item was invoked from the command line, without worrying about behaviour differences between this and the exec statement. It also means runpy can avoid the "implicitly modify the current namespace" behaviour that exec and execfile currently have. The basic function runpy.run_code would look like: def run_code(code, init_globals=None, mod_name=None, mod_file=None, mod_loader=None): """Executes a string of source code or a code object Returns the resulting top level namespace dictionary """ # Handle omitted arguments if mod_name is None: mod_name = "" if mod_file is None: mod_file = "" if mod_loader is None: mod_loader = StandardImportLoader(".") # Set up the top level namespace dictionary run_globals = {} if init_globals is not None: run_globals.update(init_globals) run_globals.update(__name__ = mod_name, __file__ = mod_file, __loader__ = mod_loader) # Run it! exec code in run_globals return run_globals Note that run_code always creates a new execution dictionary and returns it, in contrast to exec and execfile. This is so that naively doing: run_code("print 'Hi there!'", globals()) or: run_code("print 'Hi there!'", locals()) doesn't trash __name__, __file__ or __loader__ in the current module (which would be bad). And runpy.run_module would look something like: def run_module(mod_name, run_globals=None, run_name=None, as_script=False) loader = _get_loader(mod_name) # Handle lack of imp.get_loader code = loader.get_code(mod_name) filename = _get_filename(loader, mod_name) # Handle lack of protocol if run_name is None: run_name = mod_name if as_script: sys.argv[0] = filename return run_code(code, run_globals, run_name, filename, loader) Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] (no subject)
Hi, I posted this to comp.lang.python, but got no response, so I thought I would consult the wise people here... I have encountered a problem with the re module. I have a multi-threaded program that does lots of regular expression searching, with some relatively complex regular expressions. Occasionally, events can conspire to mean that the re search takes minutes. That's bad enough in and of itself, but the real problem is that the re engine does not release the interpreter lock while it is running. All the other threads are therefore blocked for the entire time it takes to do the regular expression search. Is there any fundamental reason why the re module cannot release the interpreter lock, for at least some of the time it is running? The ideal situation for me would be if it could do most of its work with the lock released, since the software is running on a multi processor machine that could productively do other work while the re is being processed. Failing that, could it at least periodically release the lock to give other threads a chance to run? A quick look at the code in _sre.c suggests that for most of the time, no Python objects are being manipulated, so the interpreter lock could be released. Has anyone tried to do that? Thanks, Duncan. -- -- Duncan Grisby -- -- [EMAIL PROTECTED] -- -- http://www.grisby.org -- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On Thu, 2005-11-24 at 14:11 +, Duncan Grisby wrote: > Hi, > > I posted this to comp.lang.python, but got no response, so I thought I > would consult the wise people here... > > I have encountered a problem with the re module. I have a > multi-threaded program that does lots of regular expression searching, > with some relatively complex regular expressions. Occasionally, events > can conspire to mean that the re search takes minutes. That's bad > enough in and of itself, but the real problem is that the re engine > does not release the interpreter lock while it is running. All the > other threads are therefore blocked for the entire time it takes to do > the regular expression search. I don't know if this will help, but in my experience compiling re's often takes longer than matching them... are you sure that it's the match and not a compile that is taking a long time? Are you using pre-compiled re's or are you dynamically generating strings and using them? > Is there any fundamental reason why the re module cannot release the > interpreter lock, for at least some of the time it is running? The > ideal situation for me would be if it could do most of its work with > the lock released, since the software is running on a multi processor > machine that could productively do other work while the re is being > processed. Failing that, could it at least periodically release the > lock to give other threads a chance to run? > > A quick look at the code in _sre.c suggests that for most of the time, > no Python objects are being manipulated, so the interpreter lock could > be released. Has anyone tried to do that? probably not... not many people would have several-minutes-to-match re's. I suspect it would be do-able... I suggest you put together a patch and submit it on SF... -- Donovan Baarda <[EMAIL PROTECTED]> http://minkirri.apana.org.au/~abo/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Re: Regular expressions
On Thursday 24 November, Donovan Baarda wrote: > I don't know if this will help, but in my experience compiling re's > often takes longer than matching them... are you sure that it's the > match and not a compile that is taking a long time? Are you using > pre-compiled re's or are you dynamically generating strings and using > them? It's definitely matching time. The res are all pre-compiled. [...] > > A quick look at the code in _sre.c suggests that for most of the time, > > no Python objects are being manipulated, so the interpreter lock could > > be released. Has anyone tried to do that? > > probably not... not many people would have several-minutes-to-match > re's. > > I suspect it would be do-able... I suggest you put together a patch and > submit it on SF... The thing that scares me about doing that is that there might be single-threadedness assumptions in the code that I don't spot. It's the kind of thing where a patch could appear to work fine, but them mysteriously fail due to some occasional race condition. Does anyone know if there is there any global state in _sre that would prevent it being re-entered, or know for certain that there isn't? Cheers, Duncan. -- -- Duncan Grisby -- -- [EMAIL PROTECTED] -- -- http://www.grisby.org -- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
Donovan Baarda wrote: > I don't know if this will help, but in my experience compiling re's > often takes longer than matching them... are you sure that it's the > match and not a compile that is taking a long time? Are you using > pre-compiled re's or are you dynamically generating strings and using > them? patterns with nested repeats can behave badly on certain types of non- matching input. (each repeat is basically a loop, and if you nest enough loops things can quickly get out of hand, even if the inner loop doesn't do much...) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
[Martin v. Löwis] > One way (I think the only way) this could happen if: > - the objects being allocated are all smaller than 256 bytes > - when allocating new objects, the requested size was different > from any other size previously deallocated. > > So if you first allocate 1,000,000 objects of size 200, and then > release them, and then allocate 1,000,000 objects of size 208, > the memory is not reused. Nope, the memory is reused in this case. While each obmalloc "pool" P is devoted to a fixed size so long as at least one object from P is in use, when all objects allocated from P have been released, P can be reassigned to any other size class. The comments in obmalloc.c are quite accurate. This particular case is talked about here: """ empty == all the pool's blocks are currently available for allocation On transition to empty, a pool is unlinked from its usedpools[] list, and linked to the front of the (file static) singly-linked freepools list, via its nextpool member. The prevpool member has no meaning in this case. Empty pools have no inherent size class: the next time a malloc finds an empty list in usedpools[], it takes the first pool off of freepools. If the size class needed happens to be the same as the size class the pool last had, some pool initialization can be skipped. """ Now if you end up allocating a million pools all devoted to 72-byte objects, and leave one object from each pool in use, then all those pools remain devoted to 72-byte objects. Wholly empty pools can be (and do get) reused freely, though. > If the objects are all of same size, or all larger than 256 bytes, > this effect does not occur. If they're larger than 256 bytes, then you see the reuse behavior of the system malloc/free, about which virtually nothing can be said that's true across all Python platforms. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Armin Rigo wrote: > Hi, > > Ok, here is the reason for the leak... > > There is in scipy a type called 'int32_arrtype' which inherits from both > another scipy type called 'signedinteger_arrtype', and from 'int'. > Obscure! This is not 100% officially allowed: you are inheriting from > two C types. You're living dangerously! This is allowed because the two types have compatible binaries (in fact the signed integer type is only the PyObject_HEAD) > > Now in this case it mostly works as expected, because the parent scipy > type has no field at all, so it's mostly like inheriting from both > 'object' and 'int' -- which is allowed, or would be if the bases were > written in the opposite order. But still, something confuses the > fragile logic of typeobject.c. (I'll leave this bit to scipy people to > debug :-) > This is definitely possible. I've tripped up in this logic before. I was beginning to suspect that it might have something to do with what is going on. > The net result is that unless you force your own tp_free as in revision > 1490, the type 'int32_arrtype' has tp_free set to int_free(), which is > the normal tp_free of 'int' objects. This causes all deallocated > int32_arrtype instances to be added to the CPython free list of integers > instead of being freed! I'm not sure this is true, It sounds plausible but I will have to check. Previously the tp_free should have been inherited as PyObject_Del for the int32_arrtype. Unless the typeobject.c code copied the tp_free from the wrong base type, this shouldn't have been the case. Thanks for the pointers. It sounds like we're getting close. Perhaps the problem is in typeobject.c -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Armin Rigo wrote: > Hi, > > Ok, here is the reason for the leak... > > There is in scipy a type called 'int32_arrtype' which inherits from both > another scipy type called 'signedinteger_arrtype', and from 'int'. > Obscure! This is not 100% officially allowed: you are inheriting from > two C types. You're living dangerously! > > Now in this case it mostly works as expected, because the parent scipy > type has no field at all, so it's mostly like inheriting from both > 'object' and 'int' -- which is allowed, or would be if the bases were > written in the opposite order. But still, something confuses the > fragile logic of typeobject.c. (I'll leave this bit to scipy people to > debug :-) > > The net result is that unless you force your own tp_free as in revision > 1490, the type 'int32_arrtype' has tp_free set to int_free(), which is > the normal tp_free of 'int' objects. This causes all deallocated > int32_arrtype instances to be added to the CPython free list of integers > instead of being freed! I can confirm that indeed the int32_arrtype object gets the tp_free slot from it's second parent (the python integer type) instead of its first parent (the new, empty signed integer type). I just did a printf after PyType_Ready was called to see what the tp_free slot contained, and indeed it contained the wrong thing. I suspect this may also be true of the float64_arrtype as well (which inherits from Python's float type). What I don't understand is why the tp_free slot from the second base type got copied over into the tp_free slot of the child. It should have received the tp_free slot of the first parent, right? I'm still looking for why that would be the case. I think, though, Armin has identified the real culprit of the problem. I apologize for any consternation over the memory manager that may have taken place. This problem is obviously an issue of dual inheritance in C. I understand this is not well tested code, but in principle it should work correctly, right? I'll keep looking to see if I made a mistake in believing that the int32_arrtype should have inherited its tp_free slot from the first parent and not the second. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Problems with mro for dual inheritance in C [Was: Problems with the Python Memory Manager]
Armin Rigo wrote: > Hi, > > Ok, here is the reason for the leak... > > There is in scipy a type called 'int32_arrtype' which inherits from both > another scipy type called 'signedinteger_arrtype', and from 'int'. > Obscure! This is not 100% officially allowed: you are inheriting from > two C types. You're living dangerously! > > Now in this case it mostly works as expected, because the parent scipy > type has no field at all, so it's mostly like inheriting from both > 'object' and 'int' -- which is allowed, or would be if the bases were > written in the opposite order. But still, something confuses the > fragile logic of typeobject.c. (I'll leave this bit to scipy people to > debug :-) Well, I'm stumped on this. Note the method resolution order for the new scalar array type (exactly as I would expect). Why doesn't the int32 type inherit its tp_free from the early types first? a = zeros(10) type(a[0]).mro() [, , , , , , ] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] registering unicode codecs
While running regrtest with -R to find reference leaks I found a usage issue. When a codec is registered it is stored in the interpreter state and cannot be removed. Since it is stored as a list, if you repeated add the same search function, you will get duplicates in the list and they can't be removed. This shows up as a reference leak (which it really isn't) in test_unicode with this code modified from test_codecs_errors: import codecs def search_function(encoding): def encode1(input, errors="strict"): return 42 return (encode1, None, None, None) codecs.register(search_function) ### Should the search function be added to the search path if it is already in there? I don't understand a benefit of having duplicate search functions. Should users have access to the search path (through a codecs.unregister())? If so, should it search from the end of the list to the beginning to remove an item? That way the last entry would be removed rather than the first. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] registering unicode codecs
Neal Norwitz wrote: > While running regrtest with -R to find reference leaks I found a usage > issue. When a codec is registered it is stored in the interpreter > state and cannot be removed. Since it is stored as a list, if you > repeated add the same search function, you will get duplicates in the > list and they can't be removed. This shows up as a reference leak > (which it really isn't) in test_unicode with this code modified from > test_codecs_errors: > > import codecs > def search_function(encoding): > def encode1(input, errors="strict"): > return 42 > return (encode1, None, None, None) > > codecs.register(search_function) > > ### > > Should the search function be added to the search path if it is > already in there? I don't understand a benefit of having duplicate > search functions. Me neither :-) I never expected someone to register a search function more than once, since there's no point in doing so. > Should users have access to the search path (through a > codecs.unregister())? Maybe, but why would you want to unregister a search function ? > If so, should it search from the end of the > list to the beginning to remove an item? That way the last entry > would be removed rather than the first. I'd suggest to raise an exception in case a user tries to register a search function twice. Removal should be the same as doing list.remove(), ie. remove the first (and only) item in the list of search functions. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 24 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] registering unicode codecs
On 11/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > > > Should users have access to the search path (through a > > codecs.unregister())? > > Maybe, but why would you want to unregister a search function ? > > > If so, should it search from the end of the > > list to the beginning to remove an item? That way the last entry > > would be removed rather than the first. > > I'd suggest to raise an exception in case a user tries > to register a search function twice. This should take care of the testing problem. > Removal should be the > same as doing list.remove(), ie. remove the first (and > only) item in the list of search functions. Do you recommend adding an unregister()? It's not necessary for this case. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] registering unicode codecs
Neal Norwitz wrote: > On 11/24/05, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > >>>Should users have access to the search path (through a >>>codecs.unregister())? >> >>Maybe, but why would you want to unregister a search function ? >> >> >>>If so, should it search from the end of the >>>list to the beginning to remove an item? That way the last entry >>>would be removed rather than the first. >> >>I'd suggest to raise an exception in case a user tries >>to register a search function twice. > > > This should take care of the testing problem. > > >>Removal should be the >>same as doing list.remove(), ie. remove the first (and >>only) item in the list of search functions. > > > Do you recommend adding an unregister()? It's not necessary for this case. Not really - I don't see much of a need for this; except maybe if a codec package wants to replace another codec package. So far no-one has requested such a feature, so I'd say we don't add .unregister() until a request for it pops up. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 24 2005) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with mro for dual inheritance in C [Was: Problems with the Python Memory Manager]
Hi Travis, On Thu, Nov 24, 2005 at 10:17:43AM -0700, Travis E. Oliphant wrote: > Why doesn't the int32 > type inherit its tp_free from the early types first? In your case I suspect that the tp_free is inherited from the tp_base which is probably 'int'. I don't see how to "fix" typeobject.c, because I'm not sure that there is a solution that would do the right thing in all cases at this level. I would suggest that you just force the tp_alloc/tp_free that you want in your static types instead. That's what occurs for example if you build a similar inheritance hierarchy with classes defined in Python: these classes are then 'heap types', so they always get the generic tp_alloc/tp_free before PyType_Ready() has a chance to see them. Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SRE should release the GIL (was: no subject)
Duncan Grisby wrote: > Is there any fundamental reason why the re module cannot release the > interpreter lock, for at least some of the time it is running? The > ideal situation for me would be if it could do most of its work with > the lock released, since the software is running on a multi processor > machine that could productively do other work while the re is being > processed. Failing that, could it at least periodically release the > lock to give other threads a chance to run? Formally: no; it access a Python string/Python unicode object all the time. Now, since all the shared objects it accesses are immutable, likely no harm would be done releasing the GIL. I think SRE was originally also intended to operate on array.array objects; this would have caused bigger problems. Not sure whether this is still an issue. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] reference leaks
There are still a few reference leaks I've been able to identify. I didn't see an obvious solution to these (well, I saw one obvious solution which crashed, so obviously I was wrong). When running regrtest with -R here are the ref leaks reported: test_codeccallbacks leaked [2, 2, 2, 2] references test_compiler leaked [176, 242, 202, 248] references test_generators leaked [254, 254, 254, 254] references test_tcl leaked [35, 35, 35, 35] references test_threading_local leaked [36, 36, 28, 36] references test_urllib2 leaked [-130, 70, -120, 60] references test_compiler and test_urllib2 are probably not real leaks, but data being cached. I'm not really sure if test_tcl is a leak or not. Since there's a lot that goes on under the covers. I didn't see anything obvious in _tkinter.c. I have no idea about test_threading_local. I'm pretty certain test_codeccallbacks and test_generators are leaks. Here is code that I gleaned/modified from the tests and causes leaks in the interpreter: test_codeccallbacks import codecs def test_callbacks(): def handler(exc): l = [u"<%d>" % ord(exc.object[pos]) for pos in xrange(exc.start, exc.end)] return (u"[%s]" % u"".join(l), exc.end) codecs.register_error("test.handler", handler) # the {} is necessary to cause the leak, {} can hold data too codecs.charmap_decode("abc", "test.handler", {}) test_callbacks() # leak from PyUnicode_DecodeCharmap() each time test_callbacks() is called test_generators from itertools import tee def fib(): def yield_identity_forever(g): while 1: yield g def _fib(): for i in yield_identity_forever(head): yield i head, tail, result = tee(_fib(), 3) return result x = fib() # x.next() leak from itertool.tee() The itertools.tee() fix I thought was quite obvious: +++ Modules/itertoolsmodule.c (working copy) @@ -356,7 +356,8 @@ { if (tdo->nextlink == NULL) tdo->nextlink = teedataobject_new(tdo->it); - Py_INCREF(tdo->nextlink); + else + Py_INCREF(tdo->nextlink); return tdo->nextlink; } However, this creates problems elsewhere. I think test_heapq crashed when I added this fix. The patch also didn't fix all the leaks, just a bunch of them. So clearly there's more going on that I'm not getting. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Martin v. Löwis wrote: > Travis Oliphant wrote: > >> So, I now believe that his code (plus the array scalar extension >> type) was actually exposing a real bug in the memory manager itself. >> In theory, the Python memory manager should have been able to re-use >> the memory for the array-scalar instances because they are always the >> same size. In practice, the memory was apparently not being re-used >> but instead new blocks were being allocated to handle the load. > > > That is really very hard to believe. Most people on this list would > probably agree that obmalloc certain *will* reuse deallocated memory > if the next request is for the very same size (number of bytes) that > the previously-release object had. Yes, I see that it does. This became more clear as all the simple tests I tried failed to reproduce the problem (and I spent some time looking at the code and reading its comments). I just can't figure out another explanation for why the problem went away when I went to using the system malloc other than some kind of corner-case in the Python memory allocator. > >> His code is quite complicated and it is difficult to replicate the >> problem. > > > That the code is complex would not so much be a problem: we often > analyse complex code here. It is a problem that the code is not > available, and it would be a problem if the problem was not > reproducable even if you had the code (i.e. if the problem would > sometimes occur, but not the next day when you ran it again). > The problem was definitely reproducible. On his machine, and on the two machines I tried to run it on. It without fail rapidly consumed all available memory. > So if you can, please post the code somewhere, and add a bugreport > on sf.net/projects/python. > I'll try to do this at some point. I'll have to get permission from him for the actual Python code. The extension modules he used are all publically available (PyMC). I changed the memory allocator in scipy --- which eliminated the problem --- so you'd have to check out an older version of the code from SVN to see the problem. Thanks for the tips. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problems with the Python Memory Manager
Martin v. Löwis wrote: > Travis Oliphant wrote: > >> As verified by removing usage of the Python PyObject_MALLOC function, >> it was the Python memory manager that was performing poorly. Even >> though the array-scalar objects were deleted, the memory manager >> would not re-use their memory for later object creation. Instead, the >> memory manager kept allocating new arenas to cover the load (when it >> should have been able to re-use the old memory that had been freed by >> the deleted objects--- again, I don't know enough about the memory >> manager to say why this happened). > > > One way (I think the only way) this could happen if: > - the objects being allocated are all smaller than 256 bytes > - when allocating new objects, the requested size was different > from any other size previously deallocated. In one version of the code I had moved all objects from the Python memory manager to the system malloc *except* the array scalars. The problem still remained, so I'm pretty sure these were the problem. The array scalars are all less than 256 bytes but they are always the same number of bytes. > > So if you first allocate 1,000,000 objects of size 200, and then > release them, and then allocate 1,000,000 objects of size 208, > the memory is not reused. That is useful information. I don't think his code was doing that kind of thing, but it definitely provides something to check on. Previously I was using the standard tp_alloc and tp_free methods (I was not setting them but just letting PyType_Ready fill those slots in with the default values).When I changed these methods to ones that used system free and system malloc the problem went away. That's why I attribute the issue to the Python memory manager. Of course, it's always possible that I was doing something wrong, but I really did try to make sure I wasn't making a mistake. I didn't do anything fancy with the Python memory allocator. The array scalars all subclass from each other in C, though. I don't see how that could be relevant, but I could be missing something. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions
This is probably OT for [Python-dev] I suspect that your problem is not the GIL but is due to something else. Rather than dorking with the interpreter's threading, you probably would be better off rethinking your problem and finding a better way to accomplish your task. On Thu, 24 Nov 2005, Duncan Grisby wrote: > On Thursday 24 November, Donovan Baarda wrote: > > > I don't know if this will help, but in my experience compiling re's > > often takes longer than matching them... are you sure that it's the > > match and not a compile that is taking a long time? Are you using > > pre-compiled re's or are you dynamically generating strings and using > > them? > > It's definitely matching time. The res are all pre-compiled. > > [...] > > > A quick look at the code in _sre.c suggests that for most of the time, > > > no Python objects are being manipulated, so the interpreter lock could > > > be released. Has anyone tried to do that? > > > > probably not... not many people would have several-minutes-to-match > > re's. > > > > I suspect it would be do-able... I suggest you put together a patch and > > submit it on SF... > > The thing that scares me about doing that is that there might be > single-threadedness assumptions in the code that I don't spot. It's the > kind of thing where a patch could appear to work fine, but them > mysteriously fail due to some occasional race condition. Does anyone > know if there is there any global state in _sre that would prevent it > being re-entered, or know for certain that there isn't? > > Cheers, > > Duncan. > > -- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bug bz2.BZ2File(...).seek(0,2) + patch
Hi, I found a bug in bz2 python module. Example: import bz2 bz2.BZ2File("test.bz2","r") bz2.seek(0,2) assert bz2.tell() != 0 Details and *patch* at: http://sourceforge.net/tracker/index.php?func=detail&aid=1366000&group_id=5470&atid=105470 Please CC-me for all your answers. Bye, Victor -- Victor Stinner - student at the UTBM (Belfort, France) http://www.haypocalc.com/wiki/Accueil signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] (no subject)
hi, test mail list :) 致 礼! Frank [EMAIL PROTECTED] 2005-11-25 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com