On approximately 10/24/2008 1:09 PM, came the following characters from the keyboard of Rhamphoryncus:
On Oct 24, 1:02 pm, Glenn Linderman <[EMAIL PROTECTED]> wrote:
On approximately 10/24/2008 8:42 AM, came the following characters from
the keyboard of Andy O'Meara:

Glenn, great post and points!
Thanks. I need to admit here that while I've got a fair bit of
professional programming experience, I'm quite new to Python -- I've not
learned its internals, nor even the full extent of its rich library. So
I have some questions that are partly about the goals of the
applications being discussed, partly about how Python is constructed,
and partly about how the library is constructed. I'm hoping to get a
better understanding of all of these; perhaps once a better
understanding is achieved, limitations will be understood, and maybe
solutions be achievable.

Let me define some speculative Python interpreters; I think the first is
today's Python:

PyA: Has a GIL. PyA threads can run within a process; but are
effectively serialized to the places where the GIL is obtained/released.
Needs the GIL because that solves lots of problems with non-reentrant
code (an example of non-reentrant code, is code that uses global (C
global, or C static) variables – note that I'm not talking about Python
vars declared global... they are only module global). In this model,
non-reentrant code could include pieces of the interpreter, and/or
extension modules.

PyB: No GIL. PyB threads acquire/release a lock around each reference to
a global variable (like "with" feature). Requires massive recoding of
all code that contains global variables. Reduces performance
significantly by the increased cost of obtaining and releasing locks.

PyC: No locks. Instead, recoding is done to eliminate global variables
(interpreter requires a state structure to be passed in). Extension
modules that use globals are prohibited... this eliminates large
portions of the library, or requires massive recoding. PyC threads do
not share data between threads except by explicit interfaces.

PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate
global variables, and each interpreter instance is provided a state
structure. There is still a GIL, however, because globals are
potentially still used by some modules. Code is added to detect use of
global variables by a module, or some contract is written whereby a
module can be declared to be reentrant and global-free. PyA threads will
obtain the GIL as they would today. PyC threads would be available to be
created. PyC instances refuse to call non-reentrant modules, but also
need not obtain the GIL... PyC threads would have limited module support
initially, but over time, most modules can be migrated to be reentrant
and global-free, so they can be used by PyC instances. Most 3rd-party
libraries today are starting to care about reentrancy anyway, because of
the popularity of threads.

PyE: objects are reclassified as shareable or non-shareable, many
types are now only allowed to be shareable.  A module and its classes
become shareable with the use of a __future__ import, and their
shareddict uses a read-write lock for scalability.  Most other
shareable objects are immutable.  Each thread is run in its own
private monitor, and thus protected from the normal threading memory
module nasties.  Alas, this gives you all the semantics, but you still
need scalable garbage collection.. and CPython's refcounting needs the
GIL.

Hmm. So I think your PyE is an instance is an attempt to be more explicit about what I said above in PyC: PyC threads do not share data between threads except by explicit interfaces. I consider your definitions of shared data types somewhat orthogonal to the types of threads, in that both PyA and PyC threads could use these new shared data items.

I think/hope that you meant that "many types are now only allowed to be non-shareable"? At least, I think that should be the default; they should be within the context of a single, independent interpreter instance, so other interpreters don't even know they exist, much less how to share them. If so, then I understand most of the rest of your paragraph, and it could be a way of providing shared objects, perhaps.

I don't understand the comment that CPython's refcounting needs the GIL... yes, it needs the GIL if multiple threads see the object, but not for private objects... only one threads uses the private objects... so today's refcounting should suffice... with each interpreter doing its own refcounting and collecting its own garbage.

Shared objects would have to do refcounting in a protected way, under some lock. One "easy" solution would be to have just two types of objects; non-shared private objects in a thread, and global shared objects; access to global shared objects would require grabbing the GIL, and then accessing the object, and releasing the GIL. An interface could allow for grabbing releasing the GIL around a block of accesses to shared objects (with GIL:) This could reduce the number of GIL acquires. Then the reference counting for those objects would also be done under the GIL, and the garbage collecting? By another PyA thread, perhaps, that grabs the GIL by default? Or a PyC one that explicitly grabs the GIL and does a step of global garbage collection?

A more complex, more parallel solution would allow for independent groups of shared objects. Of course, once there is more than one lock involved, there is more potential for deadlock, but it also provides for more parallelism. So a shared object might inherit from a "concurrency group" which would have a lock that could be acquired (with conc_group:) for access to those data items. Again, the reference counting would be done under that lock for that group of objects, and garbage collecting those objects would potentially require that lock as well...

The solution with multiple concurrency groups allows for such groups to contain a single shared object, or many (probably related) shared objects. So the application gets a choice of the granularity of sharing and locking, and can choose the number of locks to optimize performance and achieve correctness. This sort of shared data among threads, though, suffers in the limit from all the problems described in the Berkeley paper. More reliable programs might be achieved by using straight PyC threads, and some very limited "data ports" that can be combined using a higher-order flow control concept, as outlined in the paper.

While Python might be extended with these flow control concepts, they could be added gradually over time, and in the embedded case, could be implemented in some other language.


--
Glenn
------------------------------------------------------------------------

.     _|_|_|  _|
.   _|        _|    _|_|    _|_|_|    _|_|_|
.   _|  _|_|  _|  _|_|_|_|  _|    _|  _|    _|
.   _|    _|  _|  _|        _|    _|  _|    _|
.     _|_|_|  _|    _|_|_|  _|    _|  _|    _|

------------------------------------------------------------------------
Obstacles are those frightful things you see when you take your eyes off of the goal. --Henry Ford
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to