Re: Multiple interpreters retaining huge amounts of memory
> Each cycle leaks (or loses) 132k, which is a significant hit -- in my > real program the hit is around 800k/interpreter. > > I ran it through purify (after rebuilding python with the puremodule, no > pymalloc, no optimization, no threads, and debugging), and while the > results are somewhat ambiguous, it appears that Py_EndInterpreter isn't > cleaning up: > > > A) The site module > B) The builtins module > > Is there some way to properly clean these up prior to the end of > Py_EndInterpreter? You might be misinterpreting what you are seeing. Can you provide that test case so that others are able to reproduce your results? I would guess that the error is in the SWIG module, not in the cleanup of the site or builtins modules. They should cleanup fine. If there was a systematic error with the cleanup of these modules, presence of the SWIG module should be irrelevant. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
RE: Multiple interpreters retaining huge amounts of memory
On the off chance that anyone is still following this: I've got a relatively simple example of a program that loads 100 interpreters (sequentially) which all load the same swig module, do something trival, and exit. Each cycle leaks (or loses) 132k, which is a significant hit -- in my real program the hit is around 800k/interpreter. I ran it through purify (after rebuilding python with the puremodule, no pymalloc, no optimization, no threads, and debugging), and while the results are somewhat ambiguous, it appears that Py_EndInterpreter isn't cleaning up: A) The site module B) The builtins module Is there some way to properly clean these up prior to the end of Py_EndInterpreter? It seem to zap everything wether or not it can correctly clean up the interpreter/modules, and after it runs, all pointers have been destroyed. Thanks -Original Message- From: Rhamphoryncus [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 12:38 PM To: python-list@python.org Subject: Re: Multiple interpreters retaining huge amounts of memory On Feb 2, 10:32 pm, Graham Dumpleton <[EMAIL PROTECTED]> wrote: > The multi interpreter feature has some limitations, but if you know > what you are doing and your application can be run within those > limitations then it works fine. I've been wondering about this for a while. Given the severe limitations of it, what are the use cases where multiple interpreters do work? All I can think of is that it keeps separate copies of loaded python modules, but since you shouldn't be monkey-patching them anyway, why should you care? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. IRS Circular 230 Disclosure: Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein. -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
On Feb 2, 10:32 pm, Graham Dumpleton <[EMAIL PROTECTED]> wrote: > The multi interpreter feature has some limitations, but if you know > what you are doing and your application can be run within those > limitations then it works fine. I've been wondering about this for a while. Given the severe limitations of it, what are the use cases where multiple interpreters do work? All I can think of is that it keeps separate copies of loaded python modules, but since you shouldn't be monkey-patching them anyway, why should you care? -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
> What objects need to be shared across interpreters? > > My thought was to add an interpreter number to the PyThreadState > structure, to increment it when Py_NewInterpreter is called, and to > keep track of the interpreter that creates each object. On deletion, > all memory belonging to these objects would be freed. > > Thoughts? That won't work, unless you make *massive* changes to Python. There are many global objects that are shared across interpreters: Py_None, Py_True, PyExc_ValueError, PyInt_Type, and so on. They are just C globals, and there can be only a single one of them. If you think you can fix that, start by changing Python so that Py_None is per-interpreter, then continue with PyBaseObject_Type. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
RE: Multiple interpreters retaining huge amounts of memory
What objects need to be shared across interpreters? My thought was to add an interpreter number to the PyThreadState structure, to increment it when Py_NewInterpreter is called, and to keep track of the interpreter that creates each object. On deletion, all memory belonging to these objects would be freed. Thoughts? -Original Message- From: "Martin v. Löwis" [mailto:[EMAIL PROTECTED] Sent: Friday, February 01, 2008 8:34 PM To: python-list@python.org Subject: Re: Multiple interpreters retaining huge amounts of memory > Is there some way to track references per interpreter, or to get the > memory allocator to set up seperate arenas per interpreter so that it > can remove all allocated memory when the interpreter exits? No. The multi-interpreter feature doesn't really work, so you are basically on your own. If you find out what the problem is, please submit patches to bugs.python.org. In any case, the strategy you propose (with multiple arenas) would *not* work, since some objects have to be shared across interpreters. Regards, Martin - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. IRS Circular 230 Disclosure: Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein. -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
On Feb 4, 10:03 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >>> It means that > >>> environment variable separation for changes made unique to a sub > >>> interpreter is impossible. > >> That's not really true. You can't use os.environ for that, yes. > > > Which bit isn't really true? > > The last sentence ("It means that..."). > > > When you do: > > > os.environ['XYZ'] = 'ABC' > > > this results in a corresponding call to: > > > putenv('XYZ=ABC') > > Generally true, but not when you did > >os.environ=dict(os.environ) > > Furthermore, you can make changes to environment variables > without changing os.environ, which does allow for environment > variable separation across subinterpreters. > > > As a platform provider and not the person writing the application I > > can't really do it that way and effectively force people to change > > there code to make it work. It also isn't just exec that is the issue, > > as there are other system calls which can rely on the environment > > variables. > > Which system calls specifically? For a start os.system(). The call itself may not rely on environment variables, but users can expect environment variables they set in os.environ to be inherited by the program then are running. There would similarly be issues with use of popen2 module functionality because it doesn't provide means of specifying a user specific environment and just inherits current process. Yes you could rewrite all these with execve in some way, but as I said it isn't something you can really enforce on someone, especially when they might be using a third party package which is doing it and it isn't even their own code. > > It is also always hard when you aren't yourself having the problem and > > you are relying on others to try and debug their problem for you. More > > often than not the amount of information they provide isn't that good > > and even when you ask them to try specific things for you to test out > > ideas, they don't. So often one can never uncover the true problem, > > and it has thus become simpler to limit the source of potential > > problems and just tell them to avoid doing it. :-) > > You do notice that my comment in that direction (avoid using multiple > interpreters) started that subthread, right :-? I was talking about avoiding use of different versions of a C extension module in different sub interpreters, not multiple sub interpreters as a whole. Graham -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
>>> It means that >>> environment variable separation for changes made unique to a sub >>> interpreter is impossible. >> That's not really true. You can't use os.environ for that, yes. > > Which bit isn't really true? The last sentence ("It means that..."). > When you do: > > os.environ['XYZ'] = 'ABC' > > this results in a corresponding call to: > > putenv('XYZ=ABC') Generally true, but not when you did os.environ=dict(os.environ) Furthermore, you can make changes to environment variables without changing os.environ, which does allow for environment variable separation across subinterpreters. > As a platform provider and not the person writing the application I > can't really do it that way and effectively force people to change > there code to make it work. It also isn't just exec that is the issue, > as there are other system calls which can rely on the environment > variables. Which system calls specifically? > It is also always hard when you aren't yourself having the problem and > you are relying on others to try and debug their problem for you. More > often than not the amount of information they provide isn't that good > and even when you ask them to try specific things for you to test out > ideas, they don't. So often one can never uncover the true problem, > and it has thus become simpler to limit the source of potential > problems and just tell them to avoid doing it. :-) You do notice that my comment in that direction (avoid using multiple interpreters) started that subthread, right :-? Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
On Feb 4, 7:13 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > You might also read section 'Application Environment Variables' of > > that document. This talks about the problem of leakage of environment > > variables between sub interpreters. There probably isn't much that one > > can do about it as one needs to push changes to os.environ into C > > environment variables so various system library calls will get them, > > but still quite annoying that the variables set in one interpreter > > then show up in interpreters created after that point. It means that > > environment variable separation for changes made unique to a sub > > interpreter is impossible. > > That's not really true. You can't use os.environ for that, yes. Which bit isn't really true? When you do: os.environ['XYZ'] = 'ABC' this results in a corresponding call to: putenv('XYZ=ABC') as well as setting value in os.environ dictionary. >>> os.environ.__class__ class _Environ(UserDict.IterableUserDict): def __setitem__(self, key, item): putenv(key, item) self.data[key] = item Because os.environ is set from the current copy of C environ at time the sub interpreter is created, then a sub interpreter created at a later point will have XYZ show up in os.environ of that sub interpreter. > However, > you can pass explicit environment dictionaries to, say, os.execve. If > some library relies on os.environ, you could hack around this aspect > and do > > os.environ = dict(os.environ) > > Then you can customize it. Of course, changes to this dictionary now > won't be reflected into the C library's environ, so you'll have to > use execve now (but you should do so anyway in a multi-threaded > application with changing environments). As a platform provider and not the person writing the application I can't really do it that way and effectively force people to change there code to make it work. It also isn't just exec that is the issue, as there are other system calls which can rely on the environment variables. The only half reasonable solution I have ever been able to dream up is that just prior to first initialising Python that a snapshot of C environment is taken and as sub interpreters are created os.environ is replaced with a new instance of the _Environ wrapper which uses the initial snapshot rather than what the environment is at the time. At least then each sub interpreter gets a clean copy of what existed when the process first started. Even this isn't really a solution though as changes to os.environ by sub interpreters still end up getting reflected in C environment and so the C environment becomes an accumulation of settings from different code sets with a potential for conflict at some point. Luckily this issue hasn't presented itself as big enough of a problem at this point to really be concerned. > > First is that one can't use different versions of a C extension module > > in different sub interpreters. This is because the first one loaded > > effectively gets priority. > > That's not supposed to happen, AFAICT. The interpreter keeps track of > loaded extensions by file name, so if the different version lives in > a different file, that should work fine. > > Are you using sys.setdlopenflags by any chance? Setting the flags > to RTLD_GLOBAL could have that effect; you'ld get the init function > of the first module always. By default, Python uses RTLD_LOCAL, > so it should be able to keep the different versions apart (on > Unix with libdl; on Windows, symbol resolution is per-DLL anyway). That may be true, but I have seen enough people raise strange problems that I at least counsel people not to rely on being able to import different versions in different sub interpreters. The problems may well just fall into the other categories we have been discussing. Within Apache at least, another source of problems which can arise is that Apache, or other Apache modules (eg. PHP), can directly link to shared libraries where they are then loaded at global context. Even if a Python module tries to isolate itself, one can still end up with conflicts between the version of a shared library that the module may want to use and what something else has already loaded. The loader scope doesn't always protect against this. It is also always hard when you aren't yourself having the problem and you are relying on others to try and debug their problem for you. More often than not the amount of information they provide isn't that good and even when you ask them to try specific things for you to test out ideas, they don't. So often one can never uncover the true problem, and it has thus become simpler to limit the source of potential problems and just tell them to avoid doing it. :-) Graham -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
>> - objects can easily get shared across interpreters, and often are. >>This is particularly true for static variables that extensions keep, >>and for static type objects. > > Yep, but basically a problem with how people write C extension > modules. Ie., they don't write them with the fact that multiple > interpreters can be used in mind. I still consider it a bug in Python, and the multiple-interpreter feature, not so much in the extension modules. Of course, they may have bugs on top of that, but in general, they have no way of cleaning up when an interpreter shuts down (until PEP 3121 gets implemented). > Some details about this in section 'Multiple Python Sub Interpreters' > of: > > http://code.google.com/p/modwsgi/wiki/ApplicationIssues A common concern is that people think that the multiple-interpreters feature is a security mechanism, i.e. works as a sandbox. Maybe that's more a communication problem than an actual problem with the feature, however, it can't be emphasized enough that the feature is *not* a security mechanism: it is possible to get at all objects even of "other" interpreters. > You might also read section 'Application Environment Variables' of > that document. This talks about the problem of leakage of environment > variables between sub interpreters. There probably isn't much that one > can do about it as one needs to push changes to os.environ into C > environment variables so various system library calls will get them, > but still quite annoying that the variables set in one interpreter > then show up in interpreters created after that point. It means that > environment variable separation for changes made unique to a sub > interpreter is impossible. That's not really true. You can't use os.environ for that, yes. However, you can pass explicit environment dictionaries to, say, os.execve. If some library relies on os.environ, you could hack around this aspect and do os.environ = dict(os.environ) Then you can customize it. Of course, changes to this dictionary now won't be reflected into the C library's environ, so you'll have to use execve now (but you should do so anyway in a multi-threaded application with changing environments). > There is another problem with deleting interpreters and then creating > new ones. This is where a C extension module doesn't declare reference > counts to static Python objects it creates. Right - that's a clear bug in the module, though. If the Python documentation is not sufficiently clear about the requirement that _every_ assignment to a PyObject* needs to be accompanied with a Py_INCREF, feel free to contribute patches to make that more clear. > I don't know whether it is a fundamental problem with the tool or how > people use it, but Pyrex generated code seems to also do this. I've never used Pyrex myself, but I would be surprised if it really had such a severe refcounting error. >> - the mechanism of PEP 311 doesn't work for multiple interpreters. > > Yep, and since SWIG defaults to using it, it means that SWIG generated > code can't be used in anything but the main interpreter. Subversion > bindings seem to possibly have a lot of issues related to this as > well. Please understand that, when this PEP was written, this issue was explicitly discussed, and developers explicitly agreed "the multi- interpreters feature is broken, anyway, so don't let that issue stop us from providing PEP 311". > First is that one can't use different versions of a C extension module > in different sub interpreters. This is because the first one loaded > effectively gets priority. That's not supposed to happen, AFAICT. The interpreter keeps track of loaded extensions by file name, so if the different version lives in a different file, that should work fine. Are you using sys.setdlopenflags by any chance? Setting the flags to RTLD_GLOBAL could have that effect; you'ld get the init function of the first module always. By default, Python uses RTLD_LOCAL, so it should be able to keep the different versions apart (on Unix with libdl; on Windows, symbol resolution is per-DLL anyway). Kind regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
Nice to see that your comments do come from some understanding of the issues. Been number of times in the past when people have gone off saying things about multiple interpreters, didn't really know what they were talking about and were just echoing what some one else had said. Some of the things being said were often just wrong though. It just gets annoying. :-( Anyway, a few comments below with pointers to some documentation on various issues, plus details of other issues I know of. On Feb 3, 6:38 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > If you are going to make a comment such as 'multi-interpreter feature > > doesn't really work' you really should substantiate it by pointing to > > where it is documented what the problems are or enumerate yourself > > exactly what the issues are. There is already enough FUD being spread > > around about the ability to run multiple sub interpreters in an > > embedded Python application, so adding more doesn't help. > > I don't think the limitations have been documented in a systematic > manner. Some of the problems I know of are: > - objects can easily get shared across interpreters, and often are. > This is particularly true for static variables that extensions keep, > and for static type objects. Yep, but basically a problem with how people write C extension modules. Ie., they don't write them with the fact that multiple interpreters can be used in mind. Until code was fixed recently in trunk, one high profile module which had this sort of problem was psycop2. Not sure if there has been an official release yet which includes the fix. From memory the problem they had was that a static variable was caching a reference to the type object for Decimal from the interpreter which first loaded and initialised the module. That type object was then used to create instances of Decimal type which were passed to other interpreters. These Decimal instances would then fail isinstance() checks within those other interpreters. Some details about this in section 'Multiple Python Sub Interpreters' of: http://code.google.com/p/modwsgi/wiki/ApplicationIssues That section of documentation also highlights some of the other errors that can arise where file objects in particular are somehow shared between interpreters, plus issues when unmarshalling data. You might also read section 'Application Environment Variables' of that document. This talks about the problem of leakage of environment variables between sub interpreters. There probably isn't much that one can do about it as one needs to push changes to os.environ into C environment variables so various system library calls will get them, but still quite annoying that the variables set in one interpreter then show up in interpreters created after that point. It means that environment variable separation for changes made unique to a sub interpreter is impossible. > - Py_EndInterpreter doesn't guarantee that all objects are released, > and may leak. This is the problem that the OP seems to have. > All it does is to clear modules, sys, builtins, and a few other > things; it is then up to reference counting and the cycle GC > whether this releases all memory or not. There is another problem with deleting interpreters and then creating new ones. This is where a C extension module doesn't declare reference counts to static Python objects it creates. When the interpreter is destroyed and objects that can be destroyed are destroyed, then it may destroy these objects which are referenced by the static variables. When a subsequent interpreter is created which tries to use the same C extension module, that static variable now contains a dangling invalid pointer to unused or reused memory. PEP 3121 could help with this by making it more obvious of what requirements exist on C extension modules to cope with such issues. I don't know whether it is a fundamental problem with the tool or how people use it, but Pyrex generated code seems to also do this. This was showing up in PyProtocols in particular when attempts were made to recycle interpreters within the lifetime of a process. Other packages having the problem were pyscopg2 again, lxml and possibly subversion bindings. Some details on this can be found in section 'Reloading Python Interpreters' of that document. > - the mechanism of PEP 311 doesn't work for multiple interpreters. Yep, and since SWIG defaults to using it, it means that SWIG generated code can't be used in anything but the main interpreter. Subversion bindings seem to possibly have a lot of issues related to this as well. Some details on this can be found in section 'Python Simplified GIL State API' of that document. > > Oh, it would also be nice to know exactly what embedded systems you > > have developed which make use of multiple sub interpreters so we can > > gauge with what standing you have to make such a comment. > > I have never used that feature myself. However, I wrote PEP 3121 >
Re: Multiple interpreters retaining huge amounts of memory
> If you are going to make a comment such as 'multi-interpreter feature > doesn't really work' you really should substantiate it by pointing to > where it is documented what the problems are or enumerate yourself > exactly what the issues are. There is already enough FUD being spread > around about the ability to run multiple sub interpreters in an > embedded Python application, so adding more doesn't help. I don't think the limitations have been documented in a systematic manner. Some of the problems I know of are: - objects can easily get shared across interpreters, and often are. This is particularly true for static variables that extensions keep, and for static type objects. - Py_EndInterpreter doesn't guarantee that all objects are released, and may leak. This is the problem that the OP seems to have. All it does is to clear modules, sys, builtins, and a few other things; it is then up to reference counting and the cycle GC whether this releases all memory or not. - the mechanism of PEP 311 doesn't work for multiple interpreters. > Oh, it would also be nice to know exactly what embedded systems you > have developed which make use of multiple sub interpreters so we can > gauge with what standing you have to make such a comment. I have never used that feature myself. However, I wrote PEP 3121 to overcome some of its limitations. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
On Feb 2, 12:34 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Is there some way to track references per interpreter, or to get the > > memory allocator to set up seperate arenas per interpreter so that it > > can remove all allocated memory when the interpreter exits? > > No. The multi-interpreter feature doesn't really work, so you are > basically on your own. If you find out what the problem is, please > submit patches to bugs.python.org. > > In any case, the strategy you propose (with multiple arenas) would *not* > work, since some objects have to be shared across interpreters. > > Regards, > Martin The multi interpreter feature has some limitations, but if you know what you are doing and your application can be run within those limitations then it works fine. If you are going to make a comment such as 'multi-interpreter feature doesn't really work' you really should substantiate it by pointing to where it is documented what the problems are or enumerate yourself exactly what the issues are. There is already enough FUD being spread around about the ability to run multiple sub interpreters in an embedded Python application, so adding more doesn't help. Oh, it would also be nice to know exactly what embedded systems you have developed which make use of multiple sub interpreters so we can gauge with what standing you have to make such a comment. Graham -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple interpreters retaining huge amounts of memory
> Is there some way to track references per interpreter, or to get the > memory allocator to set up seperate arenas per interpreter so that it > can remove all allocated memory when the interpreter exits? No. The multi-interpreter feature doesn't really work, so you are basically on your own. If you find out what the problem is, please submit patches to bugs.python.org. In any case, the strategy you propose (with multiple arenas) would *not* work, since some objects have to be shared across interpreters. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Multiple interpreters retaining huge amounts of memory
I have an application that simultaneously extends and embeds the python interpreter. It is threaded, but all python calls are performed in one thread. Several interpreters are running simultaneously -- the application receives an event, activates a particular interpreter, and calls some python code. An interpreter's life cycle is to start, load a bunch of extension modules, run intermittently for 30 -40 minutes, and end. At some point, the application calls Py_EndInterpreter on each interpreter. My memory allocation goes up by about 1MB per interpreter, of which I know that 2k (swig types) are really leaked. gc.garbage doesn't have any cycles. Is there some way to track references per interpreter, or to get the memory allocator to set up seperate arenas per interpreter so that it can remove all allocated memory when the interpreter exits? Thanks - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This message is intended only for the personal and confidential use of the designated recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. This communication is for information purposes only and should not be regarded as an offer to sell or as a solicitation of an offer to buy any financial product, an official confirmation of any transaction, or as an official statement of Lehman Brothers. Email transmission cannot be guaranteed to be secure or error-free. Therefore, we do not represent that this information is complete or accurate and it should not be relied upon as such. All information is subject to change without notice. IRS Circular 230 Disclosure: Please be advised that any discussion of U.S. tax matters contained within this communication (including any attachments) is not intended or written to be used and cannot be used for the purpose of (i) avoiding U.S. tax related penalties or (ii) promoting, marketing or recommending to another party any transaction or matter addressed herein. -- http://mail.python.org/mailman/listinfo/python-list