Wow, man. Excellent post. You want a job? The gui could use PyA threads for sure, and the audio thread could use PyC threads. It would not be a problem to limit the audio thread to only reentrant libraries.
This kind of thought is what I had in mind about finding a compromise, especially in the way that PyD would not break old code assuming that it could eventually be ported. On Fri, Oct 24, 2008 at 11:02 AM, Glenn Linderman <[EMAIL PROTECTED]> wrote: > On approximately 10/24/2008 8:42 AM, came the following characters from the > keyboard of Andy O'Meara: >> >> Glenn, great post and points! >> > > Thanks. I need to admit here that while I've got a fair bit of professional > programming experience, I'm quite new to Python -- I've not learned its > internals, nor even the full extent of its rich library. So I have some > questions that are partly about the goals of the applications being > discussed, partly about how Python is constructed, and partly about how the > library is constructed. I'm hoping to get a better understanding of all of > these; perhaps once a better understanding is achieved, limitations will be > understood, and maybe solutions be achievable. > > Let me define some speculative Python interpreters; I think the first is > today's Python: > > PyA: Has a GIL. PyA threads can run within a process; but are effectively > serialized to the places where the GIL is obtained/released. Needs the GIL > because that solves lots of problems with non-reentrant code (an example of > non-reentrant code, is code that uses global (C global, or C static) > variables – note that I'm not talking about Python vars declared global... > they are only module global). In this model, non-reentrant code could > include pieces of the interpreter, and/or extension modules. > > PyB: No GIL. PyB threads acquire/release a lock around each reference to a > global variable (like "with" feature). Requires massive recoding of all code > that contains global variables. Reduces performance significantly by the > increased cost of obtaining and releasing locks. > > PyC: No locks. Instead, recoding is done to eliminate global variables > (interpreter requires a state structure to be passed in). Extension modules > that use globals are prohibited... this eliminates large portions of the > library, or requires massive recoding. PyC threads do not share data between > threads except by explicit interfaces. > > PyD: (A hybrid of PyA & PyC). The interpreter is recoded to eliminate global > variables, and each interpreter instance is provided a state structure. > There is still a GIL, however, because globals are potentially still used by > some modules. Code is added to detect use of global variables by a module, > or some contract is written whereby a module can be declared to be reentrant > and global-free. PyA threads will obtain the GIL as they would today. PyC > threads would be available to be created. PyC instances refuse to call > non-reentrant modules, but also need not obtain the GIL... PyC threads would > have limited module support initially, but over time, most modules can be > migrated to be reentrant and global-free, so they can be used by PyC > instances. Most 3rd-party libraries today are starting to care about > reentrancy anyway, because of the popularity of threads. > > The assumptions here are that: > > Data-1) A Python interpreter doesn't provide any mechanism to share normal > data among threads, they are independent... but message passing works. > Data-2) A Python interpreter could be extended to provide mechanisms to > share special data, and the data would come with an implicit lock. > Data-3) A Python interpreter could be extended to provide unlocked access to > special data, requiring the application to handle the synchronization > between threads. Data of type 2 could be used to control access to data of > type 3. This type of data could be large, or frequently referenced data, but > only by a single thread at a time, with major handoffs to a different thread > synchronized by the application in whatever way it chooses. > > Context-1) A Python interpreter would know about threads it spawns, and > could pass in a block of context (in addition to the state structure) as a > parameter to a new thread. That block of context would belong to the thread > as long as it exists, and return to the spawner when the thread completes. > An embedded interpreter would also be given a block of context (in addition > to the state structure). This would allow application context to be created > and passed around. Pointers to shared memory structures, might be typical > context in the embedded case. > > Context-2) Embedded Python interpreters could be spawned either as PyA > threads or PyC threads. PyC threads would be limited to modules that are > reentrant. > > > I think that PyB and PyC are the visions that people see, which argue > against implementing independent interpreters. PyB isn't truly independent, > because data are shared, recoding is required, and performance suffers. Ick. > PyC requires "recoding the whole library" potentially, if it is the only > solution. PyD allows access to the whole standard library of modules, > exactly like today, but the existing limitations still obtain for PyA > threads using that model – very limited concurrency. But PyC threads would > execute in their own little environments, and not need locking. Pure Python > code would be immediately happy there. Properly coded (reentrant, > global-free) extensions would be happy there. Lots of work could be done > there, to use up multi-core/multi-CPU horsepower (shared-memory > architecture). > > Questions for people that know the Python internals: Is PyD possible? How > hard? Is a PyC thread an effective way of implementing a Python sandbox? If > it is, and if it would attract the attention of Brett Cannon, who at least > once wanted to do a thesis on Python sandboxes, he could be a helpful > supporter. > > Questions for Andy: is the type of work you want to do in independent > threads mostly pure Python? Or with libraries that you can control to some > extent? Are those libraries reentrant? Could they be made reentrant? How > much of the Python standard library would need to be available in reentrant > mode to provide useful functionality for those threads? I think you want PyC > > Questions for Patrick: So if you had a Python GUI using the whole standard > library -- would it likely runs fine in PyA threads, and still be able to > use PyC threads for the audio scripting language? Would it be a problem for > those threads to have limited library support (only reentrant modules)? > >> That's the rub... In our case, we're doing image and video >> manipulation--stuff not good to be messaging from address space to >> address space. The same argument holds for numerical processing with >> large data sets. The workers handing back huge data sets via >> messaging isn't very attractive. >> > > In the module multiprocessing environment could you not use shared memory, > then, for the large shared data items? > >> Our software runs in real time (so performance is paramount), >> interacts with other static libraries, depends on worker threads to >> perform real-time image manipulation, and leverages Windows and Mac OS >> API concepts and features. Python's performance hits have generally >> been a huge challenge with our animators because they often have to go >> back and massage their python code to improve execution performance. >> So, in short, there are many reasons why we use python as a part >> rather than a whole. >> >> The other area of pain that I mentioned in one of my other posts is >> that what we ship, above all, can't be flaky. The lack of module >> cleanup (intended to be addressed by PEP 3121), using a duplicate copy >> of the python dynamic lib, and namespace black magic to achieve >> independent interpreters are all examples that have made using python >> for us much more challenging and time-consuming then we ever >> anticipated. >> >> Again, if it turns out nothing can be done about our needs (which >> appears to be more and more like the case), I think it's important for >> everyone here to consider the points raised here in the last week. >> Moreover, realize that the python dev community really stands to gain >> from making python usable as a tool (rather than a monolith). This >> fact alone has caused lua to *rapidly* rise in popularity with >> software companies looking to embed a powerful, lightweight >> interpreter in their software. >> >> As a python language fan an enthusiast, don't let lua win! (I say >> this endearingly of course--I have the utmost respect for both >> communities and I only want to see CPython be an attractive pick when >> a company is looking to embed a language that won't intrude upon their >> app's design). >> > > Thanks for the further explanations. > > -- > Glenn -- http://nevcal.com/ > =========================== > A protocol is complete when there is nothing left to remove. > -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking > > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list