Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 2007-01-20 00:01, Brett Cannon wrote: On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2007-01-19 22:33, Brett Cannon wrote: That's a typical error situation you get in __del__ methods at the time the interpreter is shut down. Yeah, but in this case this is at the end of Py_Initialize() for the stuff I am doing to the interpreter. =) Is that in some error branch of Py_Initialize() ? Otherwise I don't see how the modules could get garbage-collected. Nope, it's code I am adding to clean out sys.modules of stuff the user didn't import themselves; it's for security reasons. I'm not sure whether that's really going to increase security: unloading of modules usually isn't safe and you cannot be sure that it's possible to reinitialize a C module once it has been loaded in the process. For Python modules this is often possible, but there still may be side-effects of the import that you cannot easily undo. Perhaps you should just move those modules out to a different dictionary and keep track of it in the import mechanism, so that while you can't access the module directly via sys.modules, the import mechanism still knows that it has been loaded and reinserts it into sys.modules if it gets imported again. I think that you get more security by explicitly limiting which modules and packages you allow to be imported in the first place and restricting what can be done with sys.path and sys.modules. I'm not exactly sure which global state you are referring to. The aliase map, the cache used by the search function ? encodings._cache . Note that the search function registry is a global managed in the thread state (it's not stored in any module). Right, but that is not the issue. If you have deleted the reference to the encodings module from sys.modules it then sets encodings._cache to None. After the deletion, if you try to encode/decode a unicode string you can an AttributeError about how encodings._cache does not have a 'get' method since it is now None instead of a dict. The function is fine and still runs, it's just that the global state it depends on is no longer the way it assume it should be. While I could add some tricks to have the cache dictionary stay alive even after the globals were set to None, I doubt that this will really fix the problem. The encoding package relies on the import mechanism, the codecs module and the _codecs builtin module. Any of these could fail to work depending on the order in which the modules get GCed. There's a reason why things in Py_Finalize() are as carefully ordered :-) Perhaps we need to apply some reordering to the steps in Py_Initialize() ?! Nah, I just need to not delete the modules. =) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 20 2007) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 1/20/07, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2007-01-20 00:01, Brett Cannon wrote: On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2007-01-19 22:33, Brett Cannon wrote: That's a typical error situation you get in __del__ methods at the time the interpreter is shut down. Yeah, but in this case this is at the end of Py_Initialize() for the stuff I am doing to the interpreter. =) Is that in some error branch of Py_Initialize() ? Otherwise I don't see how the modules could get garbage-collected. Nope, it's code I am adding to clean out sys.modules of stuff the user didn't import themselves; it's for security reasons. I'm not sure whether that's really going to increase security: unloading of modules usually isn't safe and you cannot be sure that it's possible to reinitialize a C module once it has been loaded in the process. For Python modules this is often possible, but there still may be side-effects of the import that you cannot easily undo. Perhaps you should just move those modules out to a different dictionary and keep track of it in the import mechanism, so that while you can't access the module directly via sys.modules, the import mechanism still knows that it has been loaded and reinserts it into sys.modules if it gets imported again. That's an idea. I think that you get more security by explicitly limiting which modules and packages you allow to be imported in the first place and restricting what can be done with sys.path and sys.modules. That's what I am doing. I just wanted to simplify things by having import not worry about what is already in sys.modules and just always assume what is there is safe. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 2007-01-18 20:53, Brett Cannon wrote: I have discovered an issue relating to func_globals for functions and the deallocation of the module it is contained within. Let's say you store a reference to the function encodings.search_function from the 'encodings' module (this came up in C code, but I don't see why it couldn't happen in Python code). Then you delete the one reference to the module that is stored in sys.modules, leading to its deallocation. That triggers the setting of None to every value in encodings.__dict__. Oops, now the global namespace for that module has everything valued at None. The dict doesn't get deallocated since a reference is held by encodings.search_function.func_globals and there is still a reference to that (technically held in the interpreter's codec_search_path field). So the function can still execute, but throws exceptions like AttributeError because a module variable that once held a dict now has None and thus doesn't have the 'get' method. That's a typical error situation you get in __del__ methods at the time the interpreter is shut down. The main reason for setting everything to None first is to break circular references and make sure that at least some of the object destructors can run. My question is whether this is at all worth trying to rectify. Since Google didn't turn anything up I am going to guess this is not exactly a common thing. =) That would lead me to believe some (probably most) of you will say, just leave it alone and work around it. If you can come up with a better way, sure :-) The other option I can think of is to store a reference to the module instead of just to its __dict__ in the function. The problem with that is we end up with a circular dependency of the functions in modules having a reference to the module but then the module having a reference to the functions. I tried not having the values in the module's __dict__ set to None if the reference count was above 1 and that solved this issue, but that leads to dangling references on anything in that dict that does not have a reference stored away somewhere else like encodings.search_function. Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? I'm not exactly sure which global state you are referring to. The aliase map, the cache used by the search function ? Note that the search function registry is a global managed in the thread state (it's not stored in any module). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 19 2007) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
Martin v. Löwis [EMAIL PROTECTED] wrote: Josiah Carlson schrieb: Seems to me like a bug, but the bug could be fixed if the module's dictionary kept a (circular) reference to the module object. Who else has been waiting for a __module__ attribute? This is the time machine at work: py import encodings py encodings.search_function.__module__ 'encodings' It's a string, rather than the module object, precisely to avoid cyclic references. I was saying that it would be nice if the following were true: encodings.__module__ module 'encodings' from 'C:\python25\lib\encodings\__init__.pyc' That would make it easier for functions inside a module to pass around references to the module namespace (I've had the need to do so before, and have ended up using sys.modules[__name__], but that doesn't always work). So what if it is a circular reference (module references dict which references module), we've got a GC which handles cycles just fine (when users try not to be too smart). - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2007-01-18 20:53, Brett Cannon wrote: I have discovered an issue relating to func_globals for functions and the deallocation of the module it is contained within. Let's say you store a reference to the function encodings.search_function from the 'encodings' module (this came up in C code, but I don't see why it couldn't happen in Python code). Then you delete the one reference to the module that is stored in sys.modules, leading to its deallocation. That triggers the setting of None to every value in encodings.__dict__. Oops, now the global namespace for that module has everything valued at None. The dict doesn't get deallocated since a reference is held by encodings.search_function.func_globals and there is still a reference to that (technically held in the interpreter's codec_search_path field). So the function can still execute, but throws exceptions like AttributeError because a module variable that once held a dict now has None and thus doesn't have the 'get' method. That's a typical error situation you get in __del__ methods at the time the interpreter is shut down. Yeah, but in this case this is at the end of Py_Initialize() for the stuff I am doing to the interpreter. =) The main reason for setting everything to None first is to break circular references and make sure that at least some of the object destructors can run. I know the reason, it just happens to occur at a bad time for me. My question is whether this is at all worth trying to rectify. Since Google didn't turn anything up I am going to guess this is not exactly a common thing. =) That would lead me to believe some (probably most) of you will say, just leave it alone and work around it. If you can come up with a better way, sure :-) The other option I can think of is to store a reference to the module instead of just to its __dict__ in the function. The problem with that is we end up with a circular dependency of the functions in modules having a reference to the module but then the module having a reference to the functions. I tried not having the values in the module's __dict__ set to None if the reference count was above 1 and that solved this issue, but that leads to dangling references on anything in that dict that does not have a reference stored away somewhere else like encodings.search_function. Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? I'm not exactly sure which global state you are referring to. The aliase map, the cache used by the search function ? encodings._cache . Note that the search function registry is a global managed in the thread state (it's not stored in any module). Right, but that is not the issue. If you have deleted the reference to the encodings module from sys.modules it then sets encodings._cache to None. After the deletion, if you try to encode/decode a unicode string you can an AttributeError about how encodings._cache does not have a 'get' method since it is now None instead of a dict. The function is fine and still runs, it's just that the global state it depends on is no longer the way it assume it should be. -Brett -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 19 2007) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 1/18/07, Martin v. Löwis [EMAIL PROTECTED] wrote: Brett Cannon schrieb: Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? There is an old patch by Armin Rigo ( python.org/sf/812369 ), which attempts to implement shutdown based on gc, rather than the explicit clearing of modules. It would be good if that could be put to work; I don't know what undesirable side effects doing so would cause. I will have a look. Short of that, I don't think Python needs to support explicit deletion of the encodings module from sys.modules when somebody still has a reference to the search function. Don't do that, then. =) Yeah. As of this moment I am leaving __builtin__, exceptions, encodings, codecs, encodings.utf_8, warnings, and sys. I am deleting all other modules after Py_Initialize finishes its thing. I need to do a security audit on all of those modules before I permanently let them stick around (which is what I was hoping to avoid). I am also hoping make the sys module not be required to stay since it is only there because of the amount of stuff that is put into the module before its __dict__ is cached by the import machinery. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote: On 2007-01-19 22:33, Brett Cannon wrote: That's a typical error situation you get in __del__ methods at the time the interpreter is shut down. Yeah, but in this case this is at the end of Py_Initialize() for the stuff I am doing to the interpreter. =) Is that in some error branch of Py_Initialize() ? Otherwise I don't see how the modules could get garbage-collected. Nope, it's code I am adding to clean out sys.modules of stuff the user didn't import themselves; it's for security reasons. I'm not exactly sure which global state you are referring to. The aliase map, the cache used by the search function ? encodings._cache . Note that the search function registry is a global managed in the thread state (it's not stored in any module). Right, but that is not the issue. If you have deleted the reference to the encodings module from sys.modules it then sets encodings._cache to None. After the deletion, if you try to encode/decode a unicode string you can an AttributeError about how encodings._cache does not have a 'get' method since it is now None instead of a dict. The function is fine and still runs, it's just that the global state it depends on is no longer the way it assume it should be. While I could add some tricks to have the cache dictionary stay alive even after the globals were set to None, I doubt that this will really fix the problem. The encoding package relies on the import mechanism, the codecs module and the _codecs builtin module. Any of these could fail to work depending on the order in which the modules get GCed. There's a reason why things in Py_Finalize() are as carefully ordered :-) Perhaps we need to apply some reordering to the steps in Py_Initialize() ?! Nah, I just need to not delete the modules. =) -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
Josiah Carlson schrieb: I was saying that it would be nice if the following were true: encodings.__module__ module 'encodings' from 'C:\python25\lib\encodings\__init__.pyc' Ah, ok. It would be somewhat confusing, though, that __module__ is sometimes a module object, and sometimes a string (it certainly confused me). So what if it is a circular reference (module references dict which references module), we've got a GC which handles cycles just fine (when users try not to be too smart). That remains to be seen in practice. Currently, modules are explicitly cleared at shutdown. I think any cycle with an object implementing __del__ will keep loads of modules alive, noncollectable for GC. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Problem between deallocation of modules and func_globals
I have discovered an issue relating to func_globals for functions and the deallocation of the module it is contained within. Let's say you store a reference to the function encodings.search_function from the 'encodings' module (this came up in C code, but I don't see why it couldn't happen in Python code). Then you delete the one reference to the module that is stored in sys.modules, leading to its deallocation. That triggers the setting of None to every value in encodings.__dict__. Oops, now the global namespace for that module has everything valued at None. The dict doesn't get deallocated since a reference is held by encodings.search_function.func_globals and there is still a reference to that (technically held in the interpreter's codec_search_path field). So the function can still execute, but throws exceptions like AttributeError because a module variable that once held a dict now has None and thus doesn't have the 'get' method. My question is whether this is at all worth trying to rectify. Since Google didn't turn anything up I am going to guess this is not exactly a common thing. =) That would lead me to believe some (probably most) of you will say, just leave it alone and work around it. The other option I can think of is to store a reference to the module instead of just to its __dict__ in the function. The problem with that is we end up with a circular dependency of the functions in modules having a reference to the module but then the module having a reference to the functions. I tried not having the values in the module's __dict__ set to None if the reference count was above 1 and that solved this issue, but that leads to dangling references on anything in that dict that does not have a reference stored away somewhere else like encodings.search_function. Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
Brett Cannon [EMAIL PROTECTED] wrote: I have discovered an issue relating to func_globals for functions and the deallocation of the module it is contained within. Let's say you store a reference to the function encodings.search_function from the 'encodings' module (this came up in C code, but I don't see why it couldn't happen in Python code). Then you delete the one reference to the module that is stored in sys.modules, leading to its deallocation. That triggers the setting of None to every value in encodings.__dict__. [snip] Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? I would have presumed that keeping a reference to a function should have kept the module alive. Why? If a function keeps a reference to a module's globals, then even if the module is deleted, the module's dictionary should still persist, because there exists a reference to it, through the reference to the function. Seems to me like a bug, but the bug could be fixed if the module's dictionary kept a (circular) reference to the module object. Who else has been waiting for a __module__ attribute? - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
Brett Cannon schrieb: Anybody have any ideas on how to deal with this short of rewriting some codecs stuff so that they don't depend on global state in the module or just telling me to just live with it? There is an old patch by Armin Rigo ( python.org/sf/812369 ), which attempts to implement shutdown based on gc, rather than the explicit clearing of modules. It would be good if that could be put to work; I don't know what undesirable side effects doing so would cause. Short of that, I don't think Python needs to support explicit deletion of the encodings module from sys.modules when somebody still has a reference to the search function. Don't do that, then. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Problem between deallocation of modules and func_globals
Josiah Carlson schrieb: Seems to me like a bug, but the bug could be fixed if the module's dictionary kept a (circular) reference to the module object. Who else has been waiting for a __module__ attribute? This is the time machine at work: py import encodings py encodings.search_function.__module__ 'encodings' It's a string, rather than the module object, precisely to avoid cyclic references. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com