Re: 2.6, 3.0, and truly independent intepreters

Glenn Linderman Fri, 24 Oct 2008 14:00:55 -0700

On approximately 10/24/2008 1:09 PM, came the following characters fromthe keyboard of Rhamphoryncus:

On Oct 24, 1:02 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote:

On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:

Glenn, great post and points!

Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.

Let me define some speculative Python interpreters; I think the first is
today's Python:

PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.

PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.

PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.

PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.


PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable.  A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability.  Most other
shareable objects are immutable.  Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties.  Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Hmm. So I think your PyE is an instance is an attempt to be moreexplicit about what I said above in PyC: PyC threads do not share databetween threads except by explicit interfaces. I consider yourdefinitions of shared data types somewhat orthogonal to the types ofthreads, in that both PyA and PyC threads could use these new shareddata items.

I think/hope that you meant that "many types are now only allowed to benon-shareable"? At least, I think that should be the default; theyshould be within the context of a single, independent interpreterinstance, so other interpreters don't even know they exist, much lesshow to share them. If so, then I understand most of the rest of yourparagraph, and it could be a way of providing shared objects, perhaps.

I don't understand the comment that CPython's refcounting needs theGIL... yes, it needs the GIL if multiple threads see the object, but notfor private objects... only one threads uses the private objects... sotoday's refcounting should suffice... with each interpreter doing itsown refcounting and collecting its own garbage.

Shared objects would have to do refcounting in a protected way, undersome lock. One "easy" solution would be to have just two types ofobjects; non-shared private objects in a thread, and global sharedobjects; access to global shared objects would require grabbing the GIL,and then accessing the object, and releasing the GIL. An interfacecould allow for grabbing releasing the GIL around a block of accesses toshared objects (with GIL:) This could reduce the number of GILacquires. Then the reference counting for those objects would also bedone under the GIL, and the garbage collecting? By another PyA thread,perhaps, that grabs the GIL by default? Or a PyC one that explicitlygrabs the GIL and does a step of global garbage collection?

A more complex, more parallel solution would allow for independentgroups of shared objects. Of course, once there is more than one lockinvolved, there is more potential for deadlock, but it also provides formore parallelism. So a shared object might inherit from a "concurrencygroup" which would have a lock that could be acquired (with conc_group:)for access to those data items. Again, the reference counting would bedone under that lock for that group of objects, and garbage collectingthose objects would potentially require that lock as well...

The solution with multiple concurrency groups allows for such groups tocontain a single shared object, or many (probably related) sharedobjects. So the application gets a choice of the granularity of sharingand locking, and can choose the number of locks to optimize performanceand achieve correctness. This sort of shared data among threads,though, suffers in the limit from all the problems described in theBerkeley paper. More reliable programs might be achieved by usingstraight PyC threads, and some very limited "data ports" that can becombined using a higher-order flow control concept, as outlined in thepaper.

While Python might be extended with these flow control concepts, theycould be added gradually over time, and in the embedded case, could beimplemented in some other language.



--
Glenn
------------------------------------------------------------------------

.     _|_|_|  _|
.   _|        _|    _|_|    _|_|_|    _|_|_|
.   _|  _|_|  _|  _|_|_|_|  _|    _|  _|    _|
.   _|    _|  _|  _|        _|    _|  _|    _|
.     _|_|_|  _|    _|_|_|  _|    _|  _|    _|

------------------------------------------------------------------------

Obstacles are those frightful things you see when you take your eyes offof the goal. --Henry Ford

--
http://mail.python.org/mailman/listinfo/python-list

Re: 2.6, 3.0, and truly independent intepreters

Reply via email to