Hi Andy,
Andy wrote: > However, we require true thread/interpreter > independence so python 2 has been frustrating at time, to say the > least. Please don't start with "but really, python supports multiple > interpreters" because I've been there many many times with people. > And, yes, I'm aware of the multiprocessing module added in 2.6, but > that stuff isn't lightweight and isn't suitable at all for many > environments (including ours). This is a very conflicting set of statements and whilst you appear to be extremely clear on what you want here, and why multiprocessing, and associated techniques are not appropriate, this does sound very conflicting. I'm guessing I'm not the only person who finds this a little odd. Based on the size of the thread, having read it all, I'm guessing also that you're not going to have an immediate solution but a work around. However, also based on reading it, I think it's a usecase that would be generally useful in embedding python. So, I'll give it a stab as to what I think you're after. The scenario as I understand it is this: * You have an application written in C,C++ or similar. * You've been providing users the ability to script it or customise it in some fashion using scripts. Based on the conversation: * This worked well, and you really liked the results, but... * You only had one interpreter embedded in the system * You were allowing users to use multiple scripts Suddenly you go from: Single script, single memory space. To multiple scripts, unconstrained shared shared memory space. That then causes pain for you and your users. So as a result, you decided to look for this scenario: * A mechanism that allows each script to think it's the only script running on the python interpreter. * But to still have only one embedded instance of the interpreter. * With the primary motivation to eliminate the unconstrained shared memory causing breakage to your software. So, whilst the multiprocessing module gives you this: * With the primary motivation to eliminate the unconstrained shared memory causing breakage to your software. It's (for whatever reason) too heavyweight for you, due to the multiprocess usage. At a guess the reason for this is because you allow the user to run lots of these little scripts. Essentially what this means is that you want "green processes". One workaround of achieving that may be to find a way to force threads in python to ONLY be allowed access to (and only update) thread local values, rather than default to shared values. The reason I say that, is because the closest you get to green processes in python at the moment is /inside/ a python generator. It's nowhere near the level you want, but it's what made me think of the idea of green processes. Specifically if you have the canonical example of a python generator: def fib(): a,b = 1,1 while 1: a,b = b, a+b yield 1 Then no matter how many times I run that, the values are local, and can't impact each other. Now clearly this isn't what you want, but on some level it's *similar*. You want to be able to do: run(this_script) and then when (this_script) is running only use a local environment. Now, if you could change the threading API, such that there was a means of forcing all value lookups to look in thread local store before looking outside the thread local store [1], then this would give you a much greater level of safety. [1] I don't know if there is or isn't I've not been sufficiently interested to look... I suspect that this would also be a very nice easy win for many multi-threaded applications as well, reducing accidental data sharing. Indeed, reversing things such that rather than doing this: myLocal = threading.local() myLocal.X = 5 Allowing a thread to force the default to be the other way round: systemGlobals = threading.globals() systemGlobals = 5 Would make a big difference. Furthermore, it would also mean that the following: import MyModule from MyOtherModule import whizzy thing I don't know if such a change would be sufficient to stop the python interpreter going bang for extension modules though :-) I suspect also that this change, whilst potentially fraught with difficulties, would be incredibly useful in python implementations that are GIL-free (such as Jython or IronPython) Now, this for me is entirely theoretical because I don't know much about python's threading implementation (because I've never needed to), but it does seem to me to be the easier win than looking for truly independent interpreters... It would also be more generally useful, since it would make accidental sharing of data (which is where threads really hurt people most) much harder. Since it was raised in the thread, I'd like to say "use Kamaelia", but your usecase is slightly different as I understand it. You want to take existing stuff that won't be written in any particular way, to encourage it to be safely reusable in a shared environment. We do do that to an extent, but I'm guessing not quite as unconstrained as you. (We specifically require usage of things in a lightly constrained manner) I suspect though that this hypothetical ability to switch a thread to search thread locals (or only have thread locals) first would itself be incredibly useful as time goes on. Kamaelia implements the kind of model that this paper referenced in the thread advocates: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf As you'll see from this recent Pycon UK presentation: http://tinyurl.com/KamaeliaPyconUK It goes a stage further though by actively providing metaphors based around components built using inboxes/outboxes designed *specifically* to encourage safe concurrency. (heritage wise, kamaelia owes more to occam & CSP than anything else) After all we've found times when concurrency using generators is good which is most of the time - it's probably the most fundamental unit of concurrency you can get, followed by true coroutines (greenlets). Next up is threads (you can put generators into threads, but not vice versa). Next up is processes (you can put threads in processes, but not vice versa). Finishing on a random note: The interesting thing from my perspective is you essentially want something half way between threads and processes, which I called green processes for want of a decent phrase. Now that's akin to sandboxing, but I suspect leaky sandboxing might be sufficient for you. (ie a sandbox where you have to try hard to break out the box as oppose to it being trivial) I'd be pretty certain that something like green processes, or "thread local only" would be useful in the future. After all, that along with decent sandboxing would be the sort of thing necessary to allow python to be embedded in a browser. (If flash used multiple processes, it'd kill most people's systems after all, and if they don't have something like green processes, flash would make web pages even worse...) Indeed, thread local only and globals accessed via STM [1] would be incredibly handy. (I say that because generator globals and globals accessed via a CAT (which is kamaelia specific thing, but similar conceptually), works extremely well) [1] even something as lightweight as http://www.kamaelia.org/STM If a "search thread local" approach or "thread local only" approach sounds reasonable, then it may be a "leaky sandbox" approach is perhaps worth investigating. After all, a leaky sandbox may be doable. Tuppence-worthy-ly-yours,. Michael. -- http://www.kamaelia.org/GetKamaelia -- http://mail.python.org/mailman/listinfo/python-list