Re: 2.6, 3.0, and truly independent intepreters
On Nov 5, 5:09 pm, Paul Boddie [EMAIL PROTECTED] wrote: Anyway, to keep things constructive, I should ask (again) whether you looked at tinypy [1] and whether that might possibly satisfy your embedded requirements. Actually, I'm starting to get into the tinypy codebase and have been talking in detail with the leads for that project (I just branched it, in fact). TP indeed has all the right ingredients for a CPython ES API, so I'm currently working on a first draft. Interestingly, the TP VM is largely based on Lua's implementation and stresses compactness. One challenge is that it's design may be overly compact, making it a little tricky to extend and maintain (but I anticipate things will improve as we rev it). When I have a draft of this CPythonES API, I plan to post here for everyone to look at and give feedback on. The only thing that sucks is that I have a lot of other commitments right now, so I can't spend the time on this that I'd like to. Once we have that API finalized, I'll be able to start offering some bounties for filling in some of its implementation. In any case, I look forward to updating folks here on our progress! Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 8:25 am, sturlamolden [EMAIL PROTECTED] wrote: On Nov 5, 8:44 pm, Andy O'Meara [EMAIL PROTECTED] wrote: In a few earlier posts, I went into details what's meant there: http://groups.google.com/group/comp.lang.python/browse_thread/thread/... All this says is: 1. The cost of serialization and deserialization is to large. 2. Complex data structures cannot be placed in shared memory. The first claim is unsubstantiated. It depends on how much and what you serialize. Right, but I'm telling you that it *is* substantial... Unfortunately, you can't serialize thousands of opaque OS objects (which undoubtably contain sub allocations and pointers) in a frame-based, performance centric-app. Please consider that others (such as myself) are not trying to be difficult here--turns out that we're actually professionals. Again, I'm not the type to compare credentials, but it would be nice if you considered that you aren't the final authority on real-time professional software development. The second claim is plain wrong. You can put anything you want in shared memory. The mapping address of the shared memory segment may vary, but it can be dealt with (basically use integers instead of pointers, and use the base address as offset.) I explained this in other posts: OS objects are opaque and their serialization has to be done via their APIs, which is never marketed as being fast *OR* cheap. I've gone into this many times and in many posts. Saying that it can't be done is silly before you have tried. Your attitude and unwillingless to look at the use cases listed myself and others in this thread shows that this discussion may not be a good use of your time. In any case, you haven't even acknowledged that a package can't wag the dog when it comes to app development--and that's the bottom line and root liability. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 6, 9:02 pm, sturlamolden [EMAIL PROTECTED] wrote: On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote: I read Andy to stipulate that the pipe needs to transmit hundreds of megs of data and/or thousands of data structure instances. I doubt he'd be happy with memcpy either. My instinct is that contention for a lock could be the quicker option. If he needs to communicate that amount of data very often, he has a serious design problem. Hmmm... Your comment there seems to be an indicator that you don't have a lot of experience with real-time, performance-centric apps. Consider my previously listed examples of video rendering and programatic effects in real-time. You need to have a lot of stuff in threads being worked on, and as Walter described, using a signal rather than serialization is the clear choice. Or, consider Patrick's case where you have massive amounts of audio being run through a DSP-- it just doesn't make sense to serialize a intricate, high level object when you could otherwise just hand it off via a single sync step. Walter and Paul really get what's being said here, so that should be an indicator to take a step back for a moment and ease up a bit... C'mon, man--we're all on the same side here! :^) Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote: On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote: People in the scientific and academic communities have to understand that the dynamics in commercial software are can be *very* different needs and have to show some open-mindedness there. You are beware that BDFL's employer is a company called Google? Python is not just used in academic settings. Turns out I have heard of Google (and how about you be a little more courteous). If you've read the posts in this thread, you'll note that the needs outlined in this thread are quite different than the needs and interests of Google. Note that my point was that python *could* and *should* be used more in end-user/desktop applications, but it can't wag the dog to use my earlier statement. Furthermore, I gave you a link to cilk++. This is a simple tool that allows you to parallelize existing C or C++ software using three small keywords. Sorry if it wasn't clear, but we need the features associated with an embedded interpreter. I checked out clik++ when you linked it and although it seems pretty cool, it's not a good fit for us for a number of reasons. Also, we like the idea of helping support a FOSS project rather than license a proprietary product (again, to be clear, using cilk isn't even appropriate for our situation). As other posts have gone into extensive detail, multiprocessing unfortunately don't handle the massive/complex data structures situation (see my posts regarding real-time video processing). That is something I don't believe. Why can't multiprocessing handle that? In a few earlier posts, I went into details what's meant there: http://groups.google.com/group/comp.lang.python/browse_thread/thread/9d995e4a1153a1b2/09aaca3d94ee7a04?lnk=st#09aaca3d94ee7a04 http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344 http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b For Christ sake, researchers write global climate models using MPI. And you think a toy problem like 'real-time video processing' is a show stopper for using multiple processes. I'm not sure why you're posting this sort of stuff when it seems like you haven't checked out earlier posts in the this thread. Also, you do yourself and the people here a disservice in the way that you're speaking to me here. You never know who you're really talking to or who's reading. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Nov 4, 9:38 am, sturlamolden [EMAIL PROTECTED] wrote: First let me say that there are several solutions to the multicore problem. Multiple independendent interpreters embedded in a process is one possibility, but not the only.'' No one is disagrees there. However, motivation of this thread has been to make people here consider that it's much more preferable for CPython have has few restrictions as possible with how it's used. I think many people here assume that python is the showcase item in industrial and commercial use, but it's generally just one of many pieces of machinery that serve the app's function (so the tail can't wag the dog when it comes to app design). Some people in this thread have made comments such as make your app run in python or change your app requirements but in the world of production schedules and making sure payroll is met, those options just can't happen. People in the scientific and academic communities have to understand that the dynamics in commercial software are can be *very* different needs and have to show some open-mindedness there. The multiprocessing package has almost the same API as you would get from your suggestion, the only difference being that multiple processes is involved. As other posts have gone into extensive detail, multiprocessing unfortunately don't handle the massive/complex data structures situation (see my posts regarding real-time video processing). I'm not sure if you've followed all the discussion, but multiple processes is off the table (this is discussed at length, so just flip back into the thread history). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 11:09 pm, alex23 [EMAIL PROTECTED] wrote: On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote: I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. I don't follow this...wouldn't both of these concerns be even more true for modifying the CPython interpreter to provide the functionality you want? A great point, for sure. So, basically, the motivation and goal of this entire thread is to get an understanding for how enthusiastic/ interested the CPython dev community is at the concepts/enhancements under discussion and for all of us to better understand the root issues. So my response is basically that it was my intention to seek official/sanctioned development (and contribute developer direct support and compensation). My hope was that the increasing interest and value associated with flexible, multi-core/free-thread support is at a point where there's a critical mass of CPython developer interest (as indicated by various serious projects specifically meant to offer this support). Unfortunately, based on the posts in this thread, it's becoming clear that the scale of code changes, design changes, and testing that are necessary in order to offer this support is just too large unless the entire community is committed to the cause. Meanwhile, as many posts in the thread have pointed out, issues such as free threading and easy/clean/compartmentalized use of python are of rising importance to app developers shopping for an interpreter to embed. So unless/until CPython offers the flexibility some apps require as an embedded interpreter, we commercial guys are unfortunately forced to use alternatives to python. I just think it'd be huge win for everyone (app developers, the python dev community, and python proliferation in general) if python made its way into more commercial and industrial applications (in an embedded capacity). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Okay, here's the bottom line: * This is not about the GIL. This is about *completely* isolated interpreters; most of the time when we want to remove the GIL we want a single interpreter with lots of shared data. * Your use case, although not common, is not extraordinarily rare either. It'd be nice to support. * If CPython had supported it all along we would continue to maintain it. * However, since it's not supported today, it's not worth the time invested, API incompatibility, and general breakage it would imply. * Although it's far more work than just solving your problem, if I were to remove the GIL I'd go all the way and allow shared objects. Great recap (although saying it's not about the GIL may cause some people lose track of the root issues here, but your following comment GIL removal shows that we're on the same page). So there's really only two options here: * get a short-term bodge that works, like hacking the 3rd party library to use your shared-memory allocator. Should be far less work than hacking all of CPython. The problem there is that we're not talking about a single 3rd party API/allocator--there's many, including the OS which has its own internal allocators. My video encoding example is meant to illustrate a point, but the real-world use case is where there's allocators all over the place from all kinds of APIs, and when you want your C module to reenter the interpreter often to execute python helper code. * invest yourself in solving the *entire* problem (GIL removal with shared python objects). Well, as I mentioned, I do represent a company willing an able to expend real resources here. However, as you pointed out, there's some serious work at hand here (sadly--it didn't have to be this way) and there seems to be some really polarized people here that don't seem as interested as I am to make python more attractive for app developers shopping for an interpreter to embed. From our point of view, there's two other options which unfortunately seem to be the only out the more we seem to uncover with this discussion: 3) Start a new python implementation, let's call it CPythonES, specifically targeting performance apps and uses an explicit object/ context concept to permit the free threading under discussion here. The idea would be to just implement the core language, feature set, and a handful of modules. I refer you to that list I made earlier of essential modules. 4) Drop python, switch to Lua. The interesting thing about (3) is that it'd be in the same spirit as how OpenGL ES came to be (except in place of the need for free threading was the fact the standard OpenGL API was too overgrown and painful for the embedded scale). We're currently our own in-house version of (3), but we unfortunately have other priorities at the moment that would otherwise slow this down. Given the direction of many-core machines these days, option (3) or (4), for us, isn't a question of *if*, it's a question of *when*. So that's basically where we're at right now. As to my earlier point about representing a company ready to spend real resources, please email me off-list if anyone here would have an interest in an open CPythonES project (and get full compensation). I can say for sure that we'd be able to lead with API framework design work--that's my personal strength and we have a lot of real world experience there. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Why do you think so? For C code that is carefully written, the GIL allows *very well* to write CPU bound scripts running on other threads. (please do get back to Jesse's original remark in case you have lost the thread :-) I don't follow you there. If you're referring to multiprocessing, our concerns are: - Maturity (am I willing to tell my partners and employees that I'm betting our future on a brand-new module that imposes significant restrictions as to how our app operates?) - Liability (am I ready to invest our resources into lots of new python module-specific code to find out that a platform that we want to target isn't supported or has problems?). Like it not, we're a company and we have to show sensitivity about new or fringe packages that make our codebase less agile -- C/C++ continues to win the day in that department. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. You should really reconsider writing performance-critical code in Python. I don't follow you there... Performance-critical code in Python?? Suppose you're doing pixel-level filters on images or video, or Patrick needs to apply a DSP to some audio... Our app's performance would *tank*, in a MAJOR way (that, and/or background tasks would take 100x+ longer to do their work). Regardless of the issue under discussion, a lot of performance can be gained by using flattened data structures, less pointer, less reference counting, less objects, and so on - in the inner loops of the computation. You didn't reveal what *specific* computation you perform, so it's difficult to give specific advise. I tried to list some abbreviated examples in other posts, but here's some elaboration: - Pixel-level effects and filters, where some filters may use C procs while others may call back into the interpreter to execute logic -- while some do both, multiple times. - Image and video analysis/recognition where there's TONS of intricate data structures and logic. Those data structures and logic are easiest to develop and maintain in python, but you'll often want to call back to C procs which will, in turn, want to access Python (as well as C-level) data structures. The common pattern here is where there's a serious mix of C and python code and data structures, BUT it can all be done with a free-thread mentality since the finish point is unambiguous and distinct -- where all the results are handed back to the main app in a black and white handoff. It's *really* important for an app to freely make calls into its interpreter (or the interpreter's data structures) without having to perform lock/unlocking because that affords an app a *lot* of options and design paths. It's just not practical to be locking and locking the GIL when you want to operate on python data structures or call back into python. You seem to have placed the burden of proof on my shoulders for an app to deserve the ability to free-thread when using 3rd party packages, so how about we just agree it's not an unreasonable desire for a package (such as python) to support it and move on with the discussion. Again, if you do heavy-lifting in Python, you should consider to rewrite the performance-critical parts in C. You may find that the need for multiple CPUs goes even away. Well, the entire premise we're operating under here is that we're dealing with embarrassingly easy parallelization scenarios, so when you suggest that the need for multiple CPUs may go away, I'm worried that you're not keeping the big picture in mind. I appreciate your arguments these a PyC concept is a lot of work with some careful design work, but let's not kill the discussion just because of that. Any discussion in this newsgroup is futile, except when it either a) leads to a solution that is already possible, and the OP didn't envision, or b) is followed up by code contributions from one of the participants. If neither is likely to result, killing the discussion is the most productive thing we can do. Well, most others here seem to have a lot different definition of what qualifies as a futile discussion, so how about you allow the rest of us continue to discuss these issues and possible solutions. And, for the record, I've said multiple times I'm ready to
Re: 2.6, 3.0, and truly independent intepreters
On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote: Multiprocessing is written in C, so as for the less agile - I don't see how it's any less agile then what you've talked about. Sorry for not being more specific there, but by less agile I meant that an app's codebase is less agile if python is an absolute requirement. If I was told tomorrow that for some reason we had to drop python and go with something else, it's my job to have chosen a codebase path/roadmap such that my response back isn't just well, we're screwed then. Consider modern PC games. They have huge code bases that use DirectX and OpenGL and having a roadmap of flexibility is paramount so packages they choose to use are used in a contained and hedged fashion. It's a survival tactic for a company not to entrench themselves in a package or technology if they don't have to (and that's what I keep trying to raise in the thread--that the python dev community should embrace development that makes python a leading candidate for lightweight use). Companies want to build a flexible, powerful codebases that are married to as few components as possible. - Shared memory -- for the reasons listed in my other posts, IPC or a shared/mapped memory region doesn't work for our situation (and I venture to say, for many real world situations otherwise you'd see end- user/common apps use forking more often than threading). I would argue that the reason most people use threads as opposed to processes is simply based on ease of use and entry (which is ironic, given how many problems it causes). No, we're in agreement here -- I was just trying to offer a more detailed explanation of ease of use. It's easy because memory is shared and no IPC, serialization, or special allocator code is required. And as we both agree, it's far from easy once those threads to interact with each other. But again, my goal here is to stay on the embarrassingly easy parallelization scenarios. I would argue that most of the people taking part in this discussion are working on real world applications - sure, multiprocessing as it exists today, right now - may not support your use case, but it was evaluated to fit *many* use cases. And as I've mentioned, it's a totally great endeavor to be super proud of. That suite of functionality alone opens some *huge* doors for python and I hope folks that use it appreciate how much time and thought that undoubtably had to go into it. You get total props, for sure, and you're work is a huge and unique credit to the community. Please correct me if I am wrong in understanding what you want: You are making threads in another language (not via the threading API), embed python in those threads, but you want to be able to share objects/state between those threads, and independent interpreters. You want to be able to pass state from one interpreter to another via shared memory (e.g. pointers/contexts/etc). Example: ParentAppFoo makes 10 threads (in C) Each thread gets an itty bitty python interpreter ParentAppFoo gets a object(video) to render Rather then marshal that object, you pass a pointer to the object to the children You want to pass that pointer to an existing, or newly created itty bitty python interpreter for mangling Itty bitty python interpreter passes the object back to a C module via a pointer/context If the above is wrong, I think possible outlining it in the above form may help people conceptualize it - I really don't think you're talking about python-level processes or threads. Yeah, you have it right-on there, with added fact that the C and python execution (and data access) are highly intertwined (so getting and releasing the GIL would have to be happening all over). For example, consider and the dynamics, logic, algorithms, and data structures associated with image and video effects and image and video image recognition/analysis. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 26, 10:11 pm, James Mills [EMAIL PROTECTED] wrote: On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote: I think we miscommunicated there--I'm actually agreeing with you. I was trying to make the same point you were: that intricate and/or large structures are meant to be passed around by a top-level pointer, not using and serialization/messaging. This is what I've been trying to explain to others here; that IPC and shared memory unfortunately aren't viable options, leaving app threads (rather than child processes) as the solution. Andy, Why don't you just use a temporary file system (ram disk) to store the data that your app is manipulating. All you need to pass around then is a file descriptor. --JamesMills Unfortunately, it's the penalty of serialization and unserialization. When you're talking about stuff like memory-resident images and video (complete with their intricate and complex codecs), then the only option is to be passing around a couple pointers rather then take the hit of serialization (which is huge for video, for example). I've gone into more detail in some other posts but I could have missed something. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 27, 4:05 am, Martin v. Löwis [EMAIL PROTECTED] wrote: Andy O'Meara wrote: Well, when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Otherwise, please describe in detail how I'd get an opaque OS object (e.g. an OS ref that refers to memory- resident video) from the child process back to the parent process. WHAT PARENT PROCESS? In the same address space, to me, means a single process only, not multiple processes, and no parent process anywhere. If you have just multiple threads, the notion of passing data from a child process back to the parent process is meaningless. I know... I was just responding to you and others here keep beating the fork drum. I just trying make it clear that a shared address space is the only way to go. Ok, good, so we're in agreement that threads is the only way to deal with the intricate and complex data set issue in a performance-centric application. Again, the big picture that I'm trying to plant here is that there really is a serious need for truly independent interpreters/contexts in a shared address space. I understand that this is your mission in this thread. However, why is that your problem? Why can't you just use the existing (limited) multiple-interpreters machinery, and solve your problems with that? Because then we're back into the GIL not permitting threads efficient core use on CPU bound scripts running on other threads (when they otherwise could). Just so we're on the same page, when they otherwise could is relevant here because that's the important given: that each interpreter (context) truly never has any context with others. An example would be python scripts that generate video programatically using an initial set of params and use an in-house C module to construct frame (which in turn make and modify python C objects that wrap to intricate codec related data structures). Suppose you wanted to render 3 of these at the same time, one on each thread (3 threads). With the GIL in place, these threads can't anywhere close to their potential. Your response thus far is that the C module should release the GIL before it commences its heavy lifting. Well, the problem is that if during its heavy lifting it needs to call back into its interpreter. It's turns out that this isn't an exotic case at all: there's a *ton* of utility gained by making calls back into the interpreter. The best example is that since code more easily maintained in python than in C, a lot of the module utility code is likely to be in python. Unsurprisingly, this is the situation myself and many others are in: where we want to subsequently use the interpreter within the C module (so, as I understand it, the proposal to have the C module release the GIL unfortunately doesn't work as a general solution). For most industry-caliber packages, the expectation and convention (unless documented otherwise) is that the app can make as many contexts as its wants in whatever threads it wants because the convention is that the app is must (a) never use one context's objects in another context, and (b) never use a context at the same time from more than one thread. That's all I'm really trying to look at here. And that's indeed the case for Python, too. The app can make as many subinterpreters as it wants to, and it must not pass objects from one subinterpreter to another one, nor should it use a single interpreter from more than one thread (although that is actually supported by Python - but it surely won't hurt if you restrict yourself to a single thread per interpreter). I'm not following you there... I thought we're all in agreement that the existing C modules are FAR from being reentrant, regularly making use of static/global objects. The point I had made before is that other industry-caliber packages specifically don't have restrictions in *any* way. I appreciate your arguments these a PyC concept is a lot of work with some careful design work, but let's not kill the discussion just because of that. The fact remains that the video encoding scenario described above is a pretty reasonable situation, and as more people are commenting in this thread, there's an increasing need to offer apps more flexibility when it comes to multi-threaded use. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote: These discussion pop up every year or so and I think that most of them are not really all that necessary, since the GIL isn't all that bad. Thing is, if the topic keeps coming up, then that may be an indicator that change is truly needed. Someone much wiser than me once shared that a measure of the usefulness and quality of a package (or API) is how easily it can be added to an application--of any flavors--without the application needing to change. So in the rising world of idle cores and worker threads, I do see an increasing concern over the GIL. Although I recognize that the debate is lengthy, heated, and has strong arguments on both sides, my reading on the issue makes me feel like there's a bias for the pro-GIL side because of the volume of design and coding work associated with considering various alternatives (such as Glenn's Py* concepts). And I DO respect and appreciate where the pro-GIL people come from: who the heck wants to do all that work and recoding so that a tiny percent of developers can benefit? And my best response is that as unfortunate as it is, python needs to be more multi-threaded app- friendly if we hope to attract the next generation of app developers that want to just drop python into their app (and not have to change their app around python). For example, Lua has that property, as evidenced by its rapidly growing presence in commercial software (Blizzard uses it heavily, for example). Furthermore, there are lots of ways to tune the CPython VM to make it more or less responsive to thread switches via the various sys.set*() functions in the sys module. Most computing or I/O intense C extensions, built-in modules and object implementations already release the GIL for you, so it usually doesn't get in the way all that often. The main issue I take there is that it's often highly useful for C modules to make subsequent calls back into the interpreter. I suppose the response to that is to call the GIL before reentry, but it just seems to be more code and responsibility in scenarios where it's no necessary. Although that code and protocol may come easy to veteran CPython developers, let's not forget that an important goal is to attract new developers and companies to the scene, where they get their thread-independent code up and running using python without any unexpected reengineering. Again, why are companies choosing Lua over Python when it comes to an easy and flexible drop-in interpreter? And please take my points here to be exploratory, and not hostile or accusatory, in nature. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 27, 10:55 pm, Glenn Linderman [EMAIL PROTECTED] wrote: And I think we still are miscommunicating! Or maybe communicating anyway! So when you said object, I actually don't know whether you meant Python object or something else. I assumed Python object, which may not have been correct... but read on, I think the stuff below clears it up. Then when you mentioned thousands of objects, I imagined thousands of Python objects, and somehow transforming the blob into same... and back again. My apologies to you and others here on my use of objects -- I'm use the term generically and mean it to *not* refer to python objects (for the all the reasons discussed here). Python only makes up a small part of our app, hence my habit of objects to refer to other APIs' allocated and opaque objects (including our own and OS APIs). For all the reasons we've discussed, in our world, python objects don't travel around outside of our python C modules -- when python objects need to be passed to other parts of the app, they're converted into their non- python (portable) equivalents (ints, floats, buffers, etc--but most of the time, the objects are PyCObjects, so they can enter and leave a python context with negligible overhead). I venture to say this is pretty standard when any industry app uses a package (such as python), for various reasons: - Portability/Future (e.g. if we do decode to drop Python and go with Lua, the changes are limited to only one region of code). - Sanity (having any API's objects show up in places far away goes against easy-to-follow code). - MT flexibility (because we always never use static/global storage, we have all kinds of options when it comes to multithreading). For example, recall that by throwing python in multiple dynamic libs, we were able to achieve the GIL-less interpreter independence that we want (albeit ghetto and a pain). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Grrr... I posted a ton of lengthy replies to you and other recent posts here using Google and none of them made it, argh. Poof. There's nothing that fires more up more than lost work, so I'll have to revert short and simple answers for the time being. Argh, damn. On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote: Andy O'Meara wrote: I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. Moreover, I think this is probably the *only* way that totally independent interpreters could be realized. Converting the whole C API to use this strategy would be a very big project. Also, on the face of it, it seems like it would render all existing C extension code obsolete, although it might be possible to do something clever with macros to create a compatibility layer. Another thing to consider is that passing all these extra pointers around everywhere is bound to have some effect on performance. I'm with you on all counts, so no disagreement there. On the passing a ptr everywhere issue, perhaps one idea is that all objects could have an additional field that would point back to their parent context (ie. their interpreter). So the only prototypes that would have to be modified to contain the context ptr would be the ones that don't inherently operate on objects (e.g. importing a module). On Oct 25, 1:54 am, greg [EMAIL PROTECTED] wrote: Andy O'Meara wrote: - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. I hope you realize that starting up one of these interpreters is going to be fairly expensive. It will have to create its own versions of all the builtin constants and type objects, and import its own copy of all the modules it uses. Yeah, for sure. And I'd say that's a pretty well established convention already out there for any industry package. The pattern I'd expect to see is where the app starts worker threads, starts interpreters in one or more of each, and throws jobs to different ones (and the interpreter would persist to move on to subsequent jobs). One wonders if it wouldn't be cheaper just to fork the process. Shared memory can be used to transfer large lumps of data if needed. As I mentioned, wen you're talking about intricate data structures, OS opaque objects (ie. that have their own internal allocators), or huge data sets, even a shared memory region unfortunately can't fit the bill. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? Wrong. Let's take a step back and remind ourselves of the big picture. The goal is to have independent interpreters running in pthreads that the app starts and controls. Each interpreter never at any point is doing any thread-related stuff in any way. For example, each script job just does meat an potatoes CPU work, using callbacks that, say, programatically use OS APIs to edit and transform frame data. So I think the disconnect here is that maybe you're envisioning threads being created *in* python. To be clear, we're talking out making threads at the app level and making it a given for the app to take its safety in its own hands. As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Well, when you're talking about large, intricate data structures (which include opaque OS object refs that use process-associated allocators), even a shared memory region between the child process and the parent can't do the job. Otherwise, please describe in detail how I'd get an opaque OS object (e.g. an OS ref that refers to memory- resident video) from the child process back to the parent process. Again, the big picture that I'm trying to plant here is that there really is a serious need for truly independent interpreters/contexts in a shared address space. Consider stuff like libpng, zlib, ipgjpg, or whatever, the use pattern is always the same: make a context object, do your work in the context, and take it down. For most industry-caliber packages, the expectation and convention (unless documented otherwise) is that the app can make as many contexts as its wants in whatever threads it wants because the convention is that the app is must (a) never use one context's objects in another context, and (b) never use a context at the same time from more than one thread. That's all I'm really trying to look at here. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
And in the case of hundreds of megs of data ... and I would be surprised at someone that would embed hundreds of megs of data into an object such that it had to be serialized... seems like the proper design is to point at the data, or a subset of it, in a big buffer. Then data transfers would just transfer the offset/length and the reference to the buffer. and/or thousands of data structure instances, ... and this is another surprise! You have thousands of objects (data structure instances) to move from one thread to another? I think we miscommunicated there--I'm actually agreeing with you. I was trying to make the same point you were: that intricate and/or large structures are meant to be passed around by a top-level pointer, not using and serialization/messaging. This is what I've been trying to explain to others here; that IPC and shared memory unfortunately aren't viable options, leaving app threads (rather than child processes) as the solution. Of course, I know that data get large, but typical multimedia streams are large, binary blobs. I was under the impression that processing them usually proceeds along the lines of keeping offsets into the blobs, and interpreting, etc. Editing is usually done by making a copy of a blob, transforming it or a subset in some manner during the copy process, resulting in a new, possibly different-sized blob. Your instincts are right. I'd only add on that when you're talking about data structures associated with an intricate video format, the complexity and depth of the data structures is insane -- the LAST thing you want to burn cycles on is serializing and unserializing that stuff (so IPC is out)--again, we're already on the same page here. I think at one point you made the comment that shared memory is a solution to handle large data sets between a child process and the parent. Although this is certainty true in principle, it doesn't hold up in practice since complex data structures often contain 3rd party and OS API objects that have their own allocators. For example, in video encoding, there's TONS of objects that comprise memory-resident video from all kinds of APIs, so the idea of having them allocated from shared/mapped memory block isn't even possible. Again, I only raise this to offer evidence that doing real-world work in a child process is a deal breaker--a shared address space is just way too much to give up. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? Wrong. (even if the interpreter is 100% independent, yikes). Again, wrong. For example, have a python C module designed to programmatically generate images (and video frames) in RAM for immediate and subsequent use in animation. Meanwhile, we'd like to have a pthread with its own interpreter with an instance of this module and have it dequeue jobs as they come in (in fact, there'd be one of these threads for each excess core present on the machine). I don't understand how this example involves multiple threads. You mention a single thread (running the module), and you mention designing a module. Where is the second thread? Glenn seems to be following me here... The point is to have any many threads as the app wants, each in it's own world, running without restriction (performance wise). Maybe the app wants to run a thread for each extra core on the machine. Perhaps the disconnect here is that when I've been saying start a thread, I mean the app starts an OS thread (e.g. pthread) with the given that any contact with other threads is managed at the app level (as opposed to starting threads through python). So, as far as python knows, there's zero mention or use of threading in any way, *anywhere*. As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space. That's not true. Um... So let's say you have a opaque object ref from the OS that represents hundreds of megs of data (e.g. memory-resident video). How do you get that back to the parent process without serialization and IPC? What should really happen is just use the same address space so just a pointer changes hands. THAT's why I'm saying that a separate address space is generally a deal breaker when you have large or intricate data sets (ie. when performance matters). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:40 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: It seems to me that the very simplest move would be to remove global static data so the app could provide all thread-related data, which Andy suggests through references to the QuickTime API. This would suggest compiling python without thread support so as to leave it up to the application. I'm not sure whether you realize that this is not simple at all. Consider this fragment if (string == Py_None || index = state-lastmark || !state-mark[index] || !state-mark[index+1]) { if (empty) /* want empty string */ i = j = 0; else { Py_INCREF(Py_None); return Py_None; The way to think about is that, ideally in PyC, there are never any global variables. Instead, all globals are now part of a context (ie. a interpreter) and it would presumably be illegal to ever use them in a different context. I'd say this is already the expectation and convention for any modern, industry-grade software package marketed as extension for apps. Industry app developers just want to drop in a 3rd party package, make as many contexts as they want (in as many threads as they want), and expect to use each context without restriction (since they're ensuring contexts never interact with each other). For example, if I use zlib, libpng, or libjpg, I can make as many contexts as I want and put them in whatever threads I want. In the app, the only thing I'm on the hook for is to: (a) never use objects from one context in another context, and (b) ensure that I'm never make any calls into a module from more than one thread at the same time. Both of these requirements are trivial to follow in the embarrassingly easy parallelization scenarios, and that's why I started this thread in the first place. :^) Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 10:24 pm, Glenn Linderman [EMAIL PROTECTED] wrote: And in the case of hundreds of megs of data ... and I would be surprised at someone that would embed hundreds of megs of data into an object such that it had to be serialized... seems like the proper design is to point at the data, or a subset of it, in a big buffer. Then data transfers would just transfer the offset/length and the reference to the buffer. and/or thousands of data structure instances, ... and this is another surprise! You have thousands of objects (data structure instances) to move from one thread to another? Heh, no, we're actually in agreement here. I'm saying that in the case where the data sets are large and/or intricate, a single top- level pointer changing hands is *always* the way to go rather than serialization. For example, suppose you had some nifty python code and C procs that were doing lots of image analysis, outputting tons of intricate and rich data structures. Once the thread is done with that job, all that output is trivially transferred back to the appropriate thread by a pointer changing hands. Of course, I know that data get large, but typical multimedia streams are large, binary blobs. I was under the impression that processing them usually proceeds along the lines of keeping offsets into the blobs, and interpreting, etc. Editing is usually done by making a copy of a blob, transforming it or a subset in some manner during the copy process, resulting in a new, possibly different-sized blob. No, you're definitely right-on, with the the additional point that the representation of multimedia usually employs intricate and diverse data structures (imagine the data structure representation of a movie encoded in modern codec, such as H.264, complete with paths, regions, pixel flow, geometry, transformations, and textures). As we both agree, that's something that you *definitely* want to move around via a single pointer (and not in a serialized form). Hence, my position that apps that use python can't be forced to go through IPC or else: (a) there's a performance/resource waste to serialize and unserialize large or intricate data sets, and (b) they're required to write and maintain serialization code that otherwise doesn't serve any other purpose. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Andy O'Meara wrote: I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. Moreover, I think this is probably the *only* way that totally independent interpreters could be realized. Converting the whole C API to use this strategy would be a very big project. Also, on the face of it, it seems like it would render all existing C extension code obsolete, although it might be possible to do something clever with macros to create a compatibility layer. Another thing to consider is that passing all these extra pointers around everywhere is bound to have some effect on performance. Good points--I would agree with you on all counts there. On the passing a context everywhere performance hit, perhaps one idea is that all objects could have an additional field that would point back to their parent context (ie. their interpreter). So the only prototypes that would have to be modified to contain the context ptr would be the ones that inherently don't take any objects. This would conveniently and generally correspond to procs associated with interpreter control (e.g. importing modules, shutting down modules, etc). Andy O'Meara wrote: - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. I hope you realize that starting up one of these interpreters is going to be fairly expensive. Absolutely. I had just left that issue out in an effort to keep the discussion pointed, but it's a great point to raise. My response is that, like any 3rd party industry package, I'd say this is the expectation (that context startup and shutdown is non-trivial and to should be minimized for performance reasons). For simplicity, my examples didn't talk about this issue but in practice, it'd be typical for apps to have their worker interpreters persist as they chew through jobs. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 9:35 am, sturlamolden [EMAIL PROTECTED] wrote: Instead of appdomains (one interpreter per thread), or free threading, you could use multiple processes. Take a look at the new multiprocessing module in Python 2.6. That's mentioned earlier in the thread. There is a fundamental problem with using homebrew loading of multiple (but renamed) copies of PythonXX.dll that is easily overlooked. That is, extension modules (.pyd) are DLLs as well. Tell me about it--there's all kinds of problems and maintenance liabilities with our approach. That's why I'm here talking about this stuff. There are other options as well: - Use IronPython. It does not have a GIL. - Use Jython. It does not have a GIL. - Use pywin32 to create isolated outproc COM servers in Python. (I'm not sure what the effect of inproc servers would be.) - Use os.fork() if your platform supports it (Linux, Unix, Apple, Cygwin, Windows Vista SUA). This is the standard posix way of doing multiprocessing. It is almost unbeatable if you have a fast copy-on- write implementation of fork (that is, all platforms except Cygwin). This is discussed earlier in the thread--they're unfortunately all out. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote: Andy wrote: 1) Independent interpreters (this is the easier one--and solved, in principle anyway, by PEP 3121, by Martin v. Löwis Something like that is necessary for independent interpreters, but not sufficient. There are also all the built-in constants and type objects to consider. Most of these are statically allocated at the moment. Agreed--I was just trying to speak generally. Or, put another way, there's no hope for independent interpreters without the likes of PEP 3121. Also, as Martin pointed out, there's the issue of module cleanup some guys here may underestimate (and I'm glad Martin pointed out the importance of it). Without the module cleanup, every time a dynamic library using python loads and unloads you've got leaks. This issue is a real problem for us since our software is loaded and unloaded many many times in a host app (iTunes, WMP, etc). I hadn't raised it here yet (and I don't want to turn the discussion to this), but lack of multiple load and unload support has been another painful issue that we didn't expect to encounter when we went with python. 2) Barriers to free threading. As Jesse describes, this is simply just the GIL being in place, but of course it's there for a reason. It's there because (1) doesn't hold and there was never any specs/ guidance put forward about what should and shouldn't be done in multi- threaded apps No, it's there because it's necessary for acceptable performance when multiple threads are running in one interpreter. Independent interpreters wouldn't mean the absence of a GIL; it would only mean each interpreter having its own GIL. I see what you're saying, but let's note that what you're talking about at this point is an interpreter containing protection from the client level violating (supposed) direction put forth in python multithreaded guidelines. Glenn Linderman's post really gets at what's at hand here. It's really important to consider that it's not a given that python (or any framework) has to be designed against hazardous use. Again, I refer you to the diagrams and guidelines in the QuickTime API: http://developer.apple.com/technotes/tn/tn2125.html They tell you point-blank what you can and can't do, and it's that's simple. Their engineers can then simply create the implementation around those specs and not weigh any of the implementation down with sync mechanisms. I'm in the camp that simplicity and convention wins the day when it comes to an API. It's safe to say that software engineers expect and assume that a thread that doesn't have contact with other threads (except for explicit, controlled message/object passing) will run unhindered and safely, so I raise an eyebrow at the GIL (or any internal helper sync stuff) holding up an thread's performance when the app is designed to not need lower-level global locks. Anyway, let's talk about solutions. My company looking to support python dev community endeavor that allows the following: - an app makes N worker threads (using the OS) - each worker thread makes its own interpreter, pops scripts off a work queue, and manages exporting (and then importing) result data to other parts of the app. Generally, we're talking about CPU-bound work here. - each interpreter has the essentials (e.g. math support, string support, re support, and so on -- I realize this is open-ended, but work with me here). Let's guesstimate about what kind of work we're talking about here and if this is even in the realm of possibility. If we find that it *is* possible, let's figure out what level of work we're talking about. From there, I can get serious about writing up a PEP/spec, paid support, and so on. Regards, Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
That aside, the fundamental problem is what I perceive a fundamental design flaw in Python's C API. In Java JNI, each function takes a JNIEnv* pointer as their first argument. There is nothing the prevents you from embedding several JVMs in a process. Python can create embedded subinterpreters, but it works differently. It swaps subinterpreters like a finite state machine: only one is concurrently active, and the GIL is shared. Bingo, it seems that you've hit it right on the head there. Sadly, that's why I regard this thread largely futile (but I'm an optimist when it comes to cool software communities so here I am). I've been afraid to say it for fear of getting mauled by everyone here, but I would definitely agree if there was a context (i.e. environment) object passed around then perhaps we'd have the best of all worlds. *winces* This is discussed earlier in the thread--they're unfortunately all out. It occurs to me that tcl is doing what you want. Have you ever thought of not using Python? Bingo again. Our research says that the options are tcl, perl (although it's generally untested and not recommended by the community--definitely dealbreakers for a commercial user like us), and lua. Also, I'd rather saw off my own right arm than adopt perl, so that's out. :^) As I mentioned, we're looking to either (1) support a python dev community effort, (2) make our own high-performance python interpreter (that uses an env object as you described), or (3) drop python and go to lua. I'm favoring them in the order I list them, but the more I discuss the issue with folks here, the more people seem to be unfortunately very divided on (1). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Glenn, great post and points! Andy seems to want an implementation of independent Python processes implemented as threads within a single address space, that can be coordinated by an outer application. This actually corresponds to the model promulgated in the paper as being most likely to succeed. Yeah, that's the idea--let the highest levels run and coordinate the show. It does seem simpler and more efficient to simply copy data from one memory location to another, rather than send it in a message, especially if the data are large. That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. One thing Andy hasn't yet explained (or I missed) is why any of his application is coded in a language other than Python. Our software runs in real time (so performance is paramount), interacts with other static libraries, depends on worker threads to perform real-time image manipulation, and leverages Windows and Mac OS API concepts and features. Python's performance hits have generally been a huge challenge with our animators because they often have to go back and massage their python code to improve execution performance. So, in short, there are many reasons why we use python as a part rather than a whole. The other area of pain that I mentioned in one of my other posts is that what we ship, above all, can't be flaky. The lack of module cleanup (intended to be addressed by PEP 3121), using a duplicate copy of the python dynamic lib, and namespace black magic to achieve independent interpreters are all examples that have made using python for us much more challenging and time-consuming then we ever anticipated. Again, if it turns out nothing can be done about our needs (which appears to be more and more like the case), I think it's important for everyone here to consider the points raised here in the last week. Moreover, realize that the python dev community really stands to gain from making python usable as a tool (rather than a monolith). This fact alone has caused lua to *rapidly* rise in popularity with software companies looking to embed a powerful, lightweight interpreter in their software. As a python language fan an enthusiast, don't let lua win! (I say this endearingly of course--I have the utmost respect for both communities and I only want to see CPython be an attractive pick when a company is looking to embed a language that won't intrude upon their app's design). Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
The Global Interpreter Lock is fundamentally designed to make the interpreter easier to maintain and safer: Developers do not need to worry about other code stepping on their namespace. This makes things thread-safe, inasmuch as having multiple PThreads within the same interpreter space modifying global state and variable at once is, well, bad. A c-level module, on the other hand, can sidestep/release the GIL at will, and go on it's merry way and process away. ...Unless part of the C module execution involves the need do CPU- bound work on another thread through a different python interpreter, right? (even if the interpreter is 100% independent, yikes). For example, have a python C module designed to programmatically generate images (and video frames) in RAM for immediate and subsequent use in animation. Meanwhile, we'd like to have a pthread with its own interpreter with an instance of this module and have it dequeue jobs as they come in (in fact, there'd be one of these threads for each excess core present on the machine). As far as I can tell, it seems CPython's current state can't CPU bound parallelization in the same address space (basically, it seems that we're talking about the embarrassingly parallel scenario raised in that paper). Why does it have to be in same address space? Convenience and simplicity--the same reasons that most APIs let you hang yourself if the app does dumb things with threads. Also, when the data sets that you need to send to and from each process is large, using the same address space makes more and more sense. So, just to clarify - Andy, do you want one interpreter, $N threads (e.g. PThreads) or the ability to fork multiple heavyweight processes? Sorry if I haven't been clear, but we're talking the app starting a pthread, making a fresh/clean/independent interpreter, and then being responsible for its safety at the highest level (with the payoff of each of these threads executing without hinderance). No different than if you used most APIs out there where step 1 is always to make and init a context object and the final step is always to destroy/take- down that context object. I'm a lousy writer sometimes, but I feel bad if you took the time to describe threads vs processes. The only reason I raised IPC with my messaging isn't very attractive comment was to respond to Glenn Linderman's points regarding tradeoffs of shared memory vs no. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Another great post, Glenn!! Very well laid-out and posed!! Thanks for taking the time to lay all that out. Questions for Andy: is the type of work you want to do in independent threads mostly pure Python? Or with libraries that you can control to some extent? Are those libraries reentrant? Could they be made reentrant? How much of the Python standard library would need to be available in reentrant mode to provide useful functionality for those threads? I think you want PyC I think you've defined everything perfectly, and you're you're of course correct about my love for for the PyC model. :^) Like any software that's meant to be used without restrictions, our code and frameworks always use a context object pattern so that there's never and non-const global/shared data). I would go as far to say that this is the case with more performance-oriented software than you may think since it's usually a given for us to have to be parallel friendly in as many ways as possible. Perhaps Patrick can back me up there. As to what modules are essential... As you point out, once reentrant module implementations caught on in PyC or hybrid world, I think we'd start to see real effort to whip them into compliance-- there's just so much to be gained imho. But to answer the question, there's the obvious ones (operator, math, etc), string/buffer processing (string, re), C bridge stuff (struct, array), and OS basics (time, file system, etc). Nice-to-haves would be buffer and image decompression (zlib, libpng, etc), crypto modules, and xml. As far as I can imagine, I have to believe all of these modules already contain little, if any, global data, so I have to believe they'd be super easy to make PyC happy. Patrick, what would you see you guys using? That's the rub... In our case, we're doing image and video manipulation--stuff not good to be messaging from address space to address space. The same argument holds for numerical processing with large data sets. The workers handing back huge data sets via messaging isn't very attractive. In the module multiprocessing environment could you not use shared memory, then, for the large shared data items? As I understand things, the multiprocessing puts stuff in a child process (i.e. a separate address space), so the only to get stuff to/ from it is via IPC, which can include a shared/mapped memory region. Unfortunately, a shared address region doesn't work when you have large and opaque objects (e.g. a rendered CoreVideo movie in the QuickTime API or 300 megs of audio data that just went through a DSP). Then you've got the hit of serialization if you're got intricate data structures (that would normally would need to be serialized, such as a hashtable or something). Also, if I may speak for commercial developers out there who are just looking to get the job done without new code, it's usually always preferable to just a single high level sync object (for when the job is complete) than to start a child processes and use IPC. The former is just WAY less code, plain and simple. Andy -- http://mail.python.org/mailman/listinfo/python-list
Re: 2.6, 3.0, and truly independent intepreters
Are you familiar with the API at all? Multiprocessing was designed to mimic threading in about every way possible, the only restriction on shared data is that it must be serializable, but event then you can override or customize the behavior. Also, inter process communication is done via pipes. It can also be done with messages if you want to tweak the manager(s). I apologize in advance if I don't understand something correctly, but as I understand them, everything has to be serialized in order to go through IPC. So when you're talking about thousands of objects, buffers, and/or large OS opaque objects (e.g. memory-resident video and images), that seems like a pretty rough hit of run-time resources. Please don't misunderstand my comments to suggest that multiprocessing isn't great stuff. On the contrary, it's very impressive and it singlehandedly catapults python *way* closer to efficient CPU bound processing than it ever was before. All I mean to say is that in the case where using a shared address space with a worker pthread per spare core to do CPU bound work, it's a really big win not to have to serialize stuff. And in the case of hundreds of megs of data and/or thousands of data structure instances, it's a deal breaker to serialize and unserialize everything just so that it can be sent though IPC. It's a deal breaker for most performance-centric apps because of the unnecessary runtime resource hit and because now all those data structures being passed around have to have accompanying serialization code written (and maintained) for them. That's actually what I meant when I made the comment that a high level sync object in a shared address space is better then sending it all through IPC (when the data sets are wild and crazy). From a C/C++ point of view, I would venture to say that it's always a huge win to just stick those embarrassingly easy parallelization cases into the thread with a sync object than forking and using IPC and having to write all the serialization code. And in the case of huge data types-- such as video or image rendering--it makes me nervous to think of serializing it all just so it can go through IPC when it could just be passed using a pointer change and a single sync object. So, if I'm missing something and there's a way so pass data structures without serialization, then I'd definitely like to learn more (sorry in advance if I missed something there). When I took a look at multiprocessing my concerns where: - serialization (discussed above) - maturity (are we ready to bet the farm that mp is going to work properly on the platforms we need it to?) Again, I'm psyched that multiprocessing appeared in 2.6 and it's a huge huge step in getting everyone to unlock the power of python! But, then some of the tidbits described above are additional data points for you and others to chew on. I can tell you they're pretty important points for any performance-centric software provider (us, game developers--from EA to Ambrosia, and A/V production app developers like Patrick). Andy -- http://mail.python.org/mailman/listinfo/python-list