Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-20 Thread M.-A. Lemburg
On 2007-01-20 00:01, Brett Cannon wrote:
 On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 On 2007-01-19 22:33, Brett Cannon wrote:
 That's a typical error situation you get in __del__ methods at
 the time the interpreter is shut down.

 Yeah, but in this case this is at the end of Py_Initialize() for the
 stuff I am doing to the interpreter.  =)
 Is that in some error branch of Py_Initialize() ? Otherwise
 I don't see how the modules could get garbage-collected.

 
 Nope, it's code I am adding to clean out sys.modules of stuff the user
 didn't import themselves; it's for security reasons.

I'm not sure whether that's really going to increase
security: unloading of modules usually isn't safe and you
cannot be sure that it's possible to reinitialize a C
module once it has been loaded in the process. For Python
modules this is often possible, but there still may be
side-effects of the import that you cannot easily undo.

Perhaps you should just move those modules out to a different
dictionary and keep track of it in the import mechanism, so
that while you can't access the module directly via sys.modules,
the import mechanism still knows that it has been loaded and
reinserts it into sys.modules if it gets imported again.

I think that you get more security by explicitly
limiting which modules and packages you allow to be imported
in the first place and restricting what can be done with
sys.path and sys.modules.

 I'm not exactly sure which global state you are referring to. The
 aliase map, the cache used by the search function ?

 encodings._cache .

 Note that the search function registry is a global managed
 in the thread state (it's not stored in any module).

 Right, but that is not the issue.  If you have deleted the reference
 to the encodings module from sys.modules it then sets encodings._cache
 to None.  After the deletion, if you try to encode/decode a unicode
 string you can an AttributeError about how encodings._cache does not
 have a 'get' method since it is now None instead of a dict.  The
 function is fine and still runs, it's just that the global state it
 depends on is no longer the way it assume it should be.
 While I could add some tricks to have the cache dictionary stay
 alive even after the globals were set to None, I doubt that this
 will really fix the problem.

 The encoding package relies on the import mechanism, the codecs
 module and the _codecs builtin module. Any of these could fail
 to work depending on the order in which the modules get
 GCed.

 There's a reason why things in Py_Finalize() are as carefully
 ordered :-) Perhaps we need to apply some reordering to the
 steps in Py_Initialize() ?!

 
 Nah, I just  need to not delete the modules.  =)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 20 2007)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-20 Thread Brett Cannon
On 1/20/07, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 On 2007-01-20 00:01, Brett Cannon wrote:
  On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote:
  On 2007-01-19 22:33, Brett Cannon wrote:
  That's a typical error situation you get in __del__ methods at
  the time the interpreter is shut down.
 
  Yeah, but in this case this is at the end of Py_Initialize() for the
  stuff I am doing to the interpreter.  =)
  Is that in some error branch of Py_Initialize() ? Otherwise
  I don't see how the modules could get garbage-collected.
 
 
  Nope, it's code I am adding to clean out sys.modules of stuff the user
  didn't import themselves; it's for security reasons.

 I'm not sure whether that's really going to increase
 security: unloading of modules usually isn't safe and you
 cannot be sure that it's possible to reinitialize a C
 module once it has been loaded in the process. For Python
 modules this is often possible, but there still may be
 side-effects of the import that you cannot easily undo.

 Perhaps you should just move those modules out to a different
 dictionary and keep track of it in the import mechanism, so
 that while you can't access the module directly via sys.modules,
 the import mechanism still knows that it has been loaded and
 reinserts it into sys.modules if it gets imported again.


That's an idea.

 I think that you get more security by explicitly
 limiting which modules and packages you allow to be imported
 in the first place and restricting what can be done with
 sys.path and sys.modules.


That's what I am doing.  I just wanted to simplify things by having
import not worry about what is already in sys.modules and just always
assume what is there is safe.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread M.-A. Lemburg
On 2007-01-18 20:53, Brett Cannon wrote:
 I have discovered an issue relating to func_globals for functions and
 the deallocation of the module it is contained within.  Let's say you
 store a reference to the function encodings.search_function from the
 'encodings' module (this came up in C code, but I don't see why it
 couldn't happen in Python code).  Then you delete the one reference to
 the module that is stored in sys.modules, leading to its deallocation.
  That triggers the setting of None to every value in
 encodings.__dict__.
 
 Oops, now the global namespace for that module has everything valued
 at None.  The dict doesn't get deallocated since a reference is held
 by encodings.search_function.func_globals and there is still a
 reference to that (technically held in the interpreter's
 codec_search_path field).  So the function can still execute, but
 throws exceptions like AttributeError because a module variable that
 once held a dict now has None and thus doesn't have the 'get' method.

That's a typical error situation you get in __del__ methods at
the time the interpreter is shut down.

The main reason for setting everything to None first is to
break circular references and make sure that at least some
of the object destructors can run.

 My question is whether this is at all worth trying to rectify.  Since
 Google didn't turn anything up I am going to guess this is not exactly
 a common thing.  =)  That would lead me to believe some (probably
 most) of you will say, just leave it alone and work around it.

If you can come up with a better way, sure :-)

 The other option I can think of is to store a reference to the module
 instead of just to its __dict__ in the function.  The problem with
 that is we end up with a circular dependency of the functions in
 modules having a reference to the module but then the module having a
 reference to the functions.  I tried not having the values in the
 module's __dict__ set to None if the reference count was above 1 and
 that solved this issue, but that leads to dangling references on
 anything in that dict that does not have a reference stored away
 somewhere else like encodings.search_function.
 
 Anybody have any ideas on how to deal with this short of rewriting
 some codecs stuff so that they don't depend on global state in the
 module or just telling me to just live with it?

I'm not exactly sure which global state you are referring to. The
aliase map, the cache used by the search function ?

Note that the search function registry is a global managed
in the thread state (it's not stored in any module).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 19 2007)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 
 Josiah Carlson schrieb:
  Seems to me like a bug, but the bug could be fixed if the module's
  dictionary kept a (circular) reference to the module object.  Who else
  has been waiting for a __module__ attribute?
 
 This is the time machine at work:
 
 py import encodings
 py encodings.search_function.__module__
 'encodings'
 
 It's a string, rather than the module object, precisely to avoid cyclic
 references.

I was saying that it would be nice if the following were true:

 encodings.__module__
module 'encodings' from 'C:\python25\lib\encodings\__init__.pyc'

That would make it easier for functions inside a module to pass around
references to the module namespace (I've had the need to do so before,
and have ended up using sys.modules[__name__], but that doesn't always
work).

So what if it is a circular reference (module references dict which
references module), we've got a GC which handles cycles just fine (when
users try not to be too smart).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread Brett Cannon
On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 On 2007-01-18 20:53, Brett Cannon wrote:
  I have discovered an issue relating to func_globals for functions and
  the deallocation of the module it is contained within.  Let's say you
  store a reference to the function encodings.search_function from the
  'encodings' module (this came up in C code, but I don't see why it
  couldn't happen in Python code).  Then you delete the one reference to
  the module that is stored in sys.modules, leading to its deallocation.
   That triggers the setting of None to every value in
  encodings.__dict__.
 
  Oops, now the global namespace for that module has everything valued
  at None.  The dict doesn't get deallocated since a reference is held
  by encodings.search_function.func_globals and there is still a
  reference to that (technically held in the interpreter's
  codec_search_path field).  So the function can still execute, but
  throws exceptions like AttributeError because a module variable that
  once held a dict now has None and thus doesn't have the 'get' method.

 That's a typical error situation you get in __del__ methods at
 the time the interpreter is shut down.


Yeah, but in this case this is at the end of Py_Initialize() for the
stuff I am doing to the interpreter.  =)

 The main reason for setting everything to None first is to
 break circular references and make sure that at least some
 of the object destructors can run.


I know the reason, it just happens to occur at a bad time for me.

  My question is whether this is at all worth trying to rectify.  Since
  Google didn't turn anything up I am going to guess this is not exactly
  a common thing.  =)  That would lead me to believe some (probably
  most) of you will say, just leave it alone and work around it.

 If you can come up with a better way, sure :-)

  The other option I can think of is to store a reference to the module
  instead of just to its __dict__ in the function.  The problem with
  that is we end up with a circular dependency of the functions in
  modules having a reference to the module but then the module having a
  reference to the functions.  I tried not having the values in the
  module's __dict__ set to None if the reference count was above 1 and
  that solved this issue, but that leads to dangling references on
  anything in that dict that does not have a reference stored away
  somewhere else like encodings.search_function.
 
  Anybody have any ideas on how to deal with this short of rewriting
  some codecs stuff so that they don't depend on global state in the
  module or just telling me to just live with it?

 I'm not exactly sure which global state you are referring to. The
 aliase map, the cache used by the search function ?


encodings._cache .

 Note that the search function registry is a global managed
 in the thread state (it's not stored in any module).


Right, but that is not the issue.  If you have deleted the reference
to the encodings module from sys.modules it then sets encodings._cache
to None.  After the deletion, if you try to encode/decode a unicode
string you can an AttributeError about how encodings._cache does not
have a 'get' method since it is now None instead of a dict.  The
function is fine and still runs, it's just that the global state it
depends on is no longer the way it assume it should be.

-Brett

 --
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Jan 19 2007)
  Python/Zope Consulting and Support ...http://www.egenix.com/
  mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 

 ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread Brett Cannon
On 1/18/07, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Brett Cannon schrieb:
  Anybody have any ideas on how to deal with this short of rewriting
  some codecs stuff so that they don't depend on global state in the
  module or just telling me to just live with it?

 There is an old patch by Armin Rigo ( python.org/sf/812369 ), which
 attempts to implement shutdown based on gc, rather than the explicit
 clearing of modules. It would be good if that could be put to work;
 I don't know what undesirable side effects doing so would cause.


I will have a look.

 Short of that, I don't think Python needs to support explicit deletion
 of the encodings module from sys.modules when somebody still has a
 reference to the search function. Don't do that, then.

=)  Yeah.  As of this moment I am leaving __builtin__, exceptions,
encodings, codecs, encodings.utf_8, warnings, and sys.  I am deleting
all other modules after Py_Initialize finishes its thing.  I need to
do a security audit on all of those modules before I permanently let
them stick around (which is what I was hoping to avoid).  I am also
hoping make the sys module not be required to stay since it is only
there because of the amount of stuff that is put into the module
before its __dict__ is cached by the import machinery.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread Brett Cannon
On 1/19/07, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 On 2007-01-19 22:33, Brett Cannon wrote:
  That's a typical error situation you get in __del__ methods at
  the time the interpreter is shut down.
 
 
  Yeah, but in this case this is at the end of Py_Initialize() for the
  stuff I am doing to the interpreter.  =)

 Is that in some error branch of Py_Initialize() ? Otherwise
 I don't see how the modules could get garbage-collected.


Nope, it's code I am adding to clean out sys.modules of stuff the user
didn't import themselves; it's for security reasons.

  I'm not exactly sure which global state you are referring to. The
  aliase map, the cache used by the search function ?
 
 
  encodings._cache .
 
  Note that the search function registry is a global managed
  in the thread state (it's not stored in any module).
 
 
  Right, but that is not the issue.  If you have deleted the reference
  to the encodings module from sys.modules it then sets encodings._cache
  to None.  After the deletion, if you try to encode/decode a unicode
  string you can an AttributeError about how encodings._cache does not
  have a 'get' method since it is now None instead of a dict.  The
  function is fine and still runs, it's just that the global state it
  depends on is no longer the way it assume it should be.

 While I could add some tricks to have the cache dictionary stay
 alive even after the globals were set to None, I doubt that this
 will really fix the problem.

 The encoding package relies on the import mechanism, the codecs
 module and the _codecs builtin module. Any of these could fail
 to work depending on the order in which the modules get
 GCed.

 There's a reason why things in Py_Finalize() are as carefully
 ordered :-) Perhaps we need to apply some reordering to the
 steps in Py_Initialize() ?!


Nah, I just  need to not delete the modules.  =)

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-19 Thread Martin v. Löwis
Josiah Carlson schrieb:
 I was saying that it would be nice if the following were true:
 
  encodings.__module__
 module 'encodings' from 'C:\python25\lib\encodings\__init__.pyc'

Ah, ok. It would be somewhat confusing, though, that __module__ is
sometimes a module object, and sometimes a string (it certainly confused
me).

 So what if it is a circular reference (module references dict which
 references module), we've got a GC which handles cycles just fine (when
 users try not to be too smart).

That remains to be seen in practice. Currently, modules are explicitly
cleared at shutdown. I think any cycle with an object implementing
__del__ will keep loads of modules alive, noncollectable for GC.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Problem between deallocation of modules and func_globals

2007-01-18 Thread Brett Cannon
I have discovered an issue relating to func_globals for functions and
the deallocation of the module it is contained within.  Let's say you
store a reference to the function encodings.search_function from the
'encodings' module (this came up in C code, but I don't see why it
couldn't happen in Python code).  Then you delete the one reference to
the module that is stored in sys.modules, leading to its deallocation.
 That triggers the setting of None to every value in
encodings.__dict__.

Oops, now the global namespace for that module has everything valued
at None.  The dict doesn't get deallocated since a reference is held
by encodings.search_function.func_globals and there is still a
reference to that (technically held in the interpreter's
codec_search_path field).  So the function can still execute, but
throws exceptions like AttributeError because a module variable that
once held a dict now has None and thus doesn't have the 'get' method.

My question is whether this is at all worth trying to rectify.  Since
Google didn't turn anything up I am going to guess this is not exactly
a common thing.  =)  That would lead me to believe some (probably
most) of you will say, just leave it alone and work around it.

The other option I can think of is to store a reference to the module
instead of just to its __dict__ in the function.  The problem with
that is we end up with a circular dependency of the functions in
modules having a reference to the module but then the module having a
reference to the functions.  I tried not having the values in the
module's __dict__ set to None if the reference count was above 1 and
that solved this issue, but that leads to dangling references on
anything in that dict that does not have a reference stored away
somewhere else like encodings.search_function.

Anybody have any ideas on how to deal with this short of rewriting
some codecs stuff so that they don't depend on global state in the
module or just telling me to just live with it?

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-18 Thread Josiah Carlson

Brett Cannon [EMAIL PROTECTED] wrote:
 I have discovered an issue relating to func_globals for functions and
 the deallocation of the module it is contained within.  Let's say you
 store a reference to the function encodings.search_function from the
 'encodings' module (this came up in C code, but I don't see why it
 couldn't happen in Python code).  Then you delete the one reference to
 the module that is stored in sys.modules, leading to its deallocation.
  That triggers the setting of None to every value in
 encodings.__dict__.
[snip]
 Anybody have any ideas on how to deal with this short of rewriting
 some codecs stuff so that they don't depend on global state in the
 module or just telling me to just live with it?

I would have presumed that keeping a reference to a function should have
kept the module alive.  Why?  If a function keeps a reference to a
module's globals, then even if the module is deleted, the module's
dictionary should still persist, because there exists a reference to it,
through the reference to the function.

Seems to me like a bug, but the bug could be fixed if the module's
dictionary kept a (circular) reference to the module object.  Who else
has been waiting for a __module__ attribute?


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-18 Thread Martin v. Löwis
Brett Cannon schrieb:
 Anybody have any ideas on how to deal with this short of rewriting
 some codecs stuff so that they don't depend on global state in the
 module or just telling me to just live with it?

There is an old patch by Armin Rigo ( python.org/sf/812369 ), which
attempts to implement shutdown based on gc, rather than the explicit
clearing of modules. It would be good if that could be put to work;
I don't know what undesirable side effects doing so would cause.

Short of that, I don't think Python needs to support explicit deletion
of the encodings module from sys.modules when somebody still has a
reference to the search function. Don't do that, then.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Problem between deallocation of modules and func_globals

2007-01-18 Thread Martin v. Löwis
Josiah Carlson schrieb:
 Seems to me like a bug, but the bug could be fixed if the module's
 dictionary kept a (circular) reference to the module object.  Who else
 has been waiting for a __module__ attribute?

This is the time machine at work:

py import encodings
py encodings.search_function.__module__
'encodings'

It's a string, rather than the module object, precisely to avoid cyclic
references.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com