Re: 2.6, 3.0, and truly independent intepreters

2008-11-10 Thread Andy O'Meara
On Nov 5, 5:09 pm, Paul Boddie [EMAIL PROTECTED] wrote:


 Anyway, to keep things constructive, I should ask (again) whether you
 looked at tinypy [1] and whether that might possibly satisfy your
 embedded requirements.

Actually, I'm starting to get into the tinypy codebase and have been
talking in detail with the leads for that project (I just branched it,
in fact).  TP indeed has all the right ingredients for a CPython ES
API, so I'm currently working on a first draft. Interestingly, the TP
VM is largely based on Lua's implementation and stresses compactness.
One challenge is that it's design may be overly compact, making it a
little tricky to extend and maintain (but I anticipate things will
improve as we rev it).

When I have a draft of this CPythonES API, I plan to post here for
everyone to look at and give feedback on.  The only thing that sucks
is that I have a lot of other commitments right now, so I can't spend
the time on this that I'd like to.  Once we have that API finalized,
I'll be able to start offering some bounties for filling in some of
its implementation.  In any case, I look forward to updating folks
here on our progress!

Andy

--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-11-10 Thread Andy O'Meara
On Nov 6, 8:25 am, sturlamolden [EMAIL PROTECTED] wrote:
 On Nov 5, 8:44 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

  In a few earlier posts, I went into details what's meant there:

 http://groups.google.com/group/comp.lang.python/browse_thread/thread/...

 All this says is:

 1. The cost of serialization and deserialization is to large.
 2. Complex data structures cannot be placed in shared memory.

 The first claim is unsubstantiated. It depends on how much and what
 you serialize.

Right, but I'm telling you that it *is* substantial...  Unfortunately,
you can't serialize thousands of opaque OS objects (which undoubtably
contain sub allocations and pointers) in a frame-based, performance
centric-app.  Please consider that others (such as myself) are not
trying to be difficult here--turns out that we're actually
professionals.  Again, I'm not the type to compare credentials, but it
would be nice if you considered that you aren't the final authority on
real-time professional software development.


 The second claim is plain wrong. You can put anything you want in
 shared memory. The mapping address of the shared memory segment may
 vary, but it can be dealt with (basically use integers instead of
 pointers, and use the base address as offset.)

I explained this in other posts: OS objects are opaque and their
serialization has to be done via their APIs, which is never marketed
as being fast *OR* cheap.  I've gone into this many times and in many
posts.

 Saying that it can't be done is silly before you have tried.

Your attitude and unwillingless to look at the use cases listed myself
and others in this thread shows that this discussion may not be a good
use of your time.  In any case, you haven't even acknowledged that a
package can't wag the dog when it comes to app development--and
that's the bottom line and root liability.


Andy




--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-11-10 Thread Andy O'Meara
On Nov 6, 9:02 pm, sturlamolden [EMAIL PROTECTED] wrote:
 On Nov 7, 12:22 am, Walter Overby [EMAIL PROTECTED] wrote:

  I read Andy to stipulate that the pipe needs to transmit hundreds of
  megs of data and/or thousands of data structure instances.  I doubt
  he'd be happy with memcpy either.  My instinct is that contention for
  a lock could be the quicker option.

 If he needs to communicate that amount of data very often, he has a
 serious design problem.


Hmmm...  Your comment there seems to be an indicator that you don't
have a lot of experience with real-time, performance-centric apps.
Consider my previously listed examples of video rendering and
programatic effects in real-time. You need to have a lot of stuff in
threads being worked on, and as Walter described, using a signal
rather than serialization is the clear choice.  Or, consider Patrick's
case where you have massive amounts of audio being run through a DSP--
it just doesn't make sense to serialize a intricate, high level object
when you could otherwise just hand it off via a single sync step.
Walter and Paul really get what's being said here, so that should be
an indicator to take a step back for a moment and ease up a bit...
C'mon, man--we're all on the same side here!  :^)


Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-11-05 Thread Andy O'Meara
On Nov 4, 10:59 am, sturlamolden [EMAIL PROTECTED] wrote:
 On Nov 4, 4:27 pm, Andy O'Meara [EMAIL PROTECTED] wrote:

  People
  in the scientific and academic communities have to understand that the
  dynamics in commercial software are can be *very* different needs and
  have to show some open-mindedness there.

 You are beware that BDFL's employer is a company called Google? Python
 is not just used in academic settings.

Turns out I have heard of Google (and how about you be a little more
courteous). If you've read the posts in this thread, you'll note that
the needs outlined in this thread are quite different than the needs
and interests of Google.  Note that my point was that python *could*
and *should* be used more in end-user/desktop applications, but it
can't wag the dog to use my earlier statement.


 Furthermore, I gave you a link to cilk++. This is a simple tool that
 allows you to parallelize existing C or C++ software using three small
 keywords.

Sorry if it wasn't clear, but we need the features associated with an
embedded interpreter.  I checked out clik++ when you linked it and
although it seems pretty cool, it's not a good fit for us for a number
of reasons.  Also, we like the idea of helping support a FOSS project
rather than license a proprietary product (again, to be clear, using
cilk isn't even appropriate for our situation).


  As other posts have gone into extensive detail, multiprocessing
  unfortunately don't handle the massive/complex data structures
  situation (see my posts regarding real-time video processing).  

 That is something I don't believe. Why can't multiprocessing handle
 that?

In a few earlier posts, I went into details what's meant there:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/9d995e4a1153a1b2/09aaca3d94ee7a04?lnk=st#09aaca3d94ee7a04
http://groups.google.com/group/comp.lang.python/msg/edae2840ab432344
http://groups.google.com/group/comp.lang.python/msg/5be213c31519217b

 For Christ sake, researchers
 write global climate models using MPI. And you think a toy problem
 like 'real-time video processing' is a show stopper for using multiple
 processes.

I'm not sure why you're posting this sort of stuff when it seems like
you haven't checked out earlier posts in the this thread.  Also, you
do yourself and the people here a disservice in the way that you're
speaking to me here.  You never know who you're really talking to or
who's reading.


Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-11-04 Thread Andy O'Meara
On Nov 4, 9:38 am, sturlamolden [EMAIL PROTECTED] wrote:


 First let me say that there are several solutions to the multicore
 problem. Multiple independendent interpreters embedded in a process is
 one possibility, but not the only.''

No one is disagrees there.  However, motivation of this thread has
been to make people here consider that it's much more preferable for
CPython have has few restrictions as possible with how it's used.  I
think many people here assume that python is the showcase item in
industrial and commercial use, but it's generally just one of many
pieces of machinery that serve the app's function (so the tail can't
wag the dog when it comes to app design).  Some people in this thread
have made comments such as make your app run in python or change
your app requirements but in the world of production schedules and
making sure payroll is met, those options just can't happen.  People
in the scientific and academic communities have to understand that the
dynamics in commercial software are can be *very* different needs and
have to show some open-mindedness there.


 The multiprocessing package has almost the same API as you would get
 from your suggestion, the only difference being that multiple
 processes is involved.

As other posts have gone into extensive detail, multiprocessing
unfortunately don't handle the massive/complex data structures
situation (see my posts regarding real-time video processing).  I'm
not sure if you've followed all the discussion, but multiple processes
is off the table (this is discussed at length, so just flip back into
the thread history).


Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-11-03 Thread Andy O'Meara
On Oct 30, 11:09 pm, alex23 [EMAIL PROTECTED] wrote:
 On Oct 31, 2:05 am, Andy O'Meara [EMAIL PROTECTED] wrote:

  I don't follow you there.  If you're referring to multiprocessing, our
  concerns are:

  - Maturity (am I willing to tell my partners and employees that I'm
  betting our future on a brand-new module that imposes significant
  restrictions as to how our app operates?)
  - Liability (am I ready to invest our resources into lots of new
  python module-specific code to find out that a platform that we want
  to target isn't supported or has problems?).  Like it not, we're a
  company and we have to show sensitivity about new or fringe packages
  that make our codebase less agile -- C/C++ continues to win the day in
  that department.

 I don't follow this...wouldn't both of these concerns be even more
 true for modifying the CPython interpreter to provide the
 functionality you want?


A great point, for sure.  So, basically, the motivation and goal of
this entire thread is to get an understanding for how enthusiastic/
interested the CPython dev community is at the concepts/enhancements
under discussion and for all of us to better understand the root
issues.  So my response is basically that it was my intention to seek
official/sanctioned development (and contribute developer direct
support and compensation).

My hope was that the increasing interest and value associated with
flexible, multi-core/free-thread support is at a point where there's
a critical mass of CPython developer interest (as indicated by various
serious projects specifically meant to offer this support).
Unfortunately, based on the posts in this thread, it's becoming clear
that the scale of code changes, design changes, and testing that are
necessary in order to offer this support is just too large unless the
entire community is committed to the cause.

Meanwhile, as many posts in the thread have pointed out, issues such
as free threading and easy/clean/compartmentalized use of python are
of rising importance to app developers shopping for an interpreter to
embed.  So unless/until CPython offers the flexibility some apps
require as an embedded interpreter, we commercial guys are
unfortunately forced to use alternatives to python.  I just think it'd
be huge win for everyone (app developers, the python dev community,
and python proliferation in general) if python made its way into more
commercial and industrial applications (in an embedded capacity).


Andy






--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara


 Okay, here's the bottom line:
 * This is not about the GIL.  This is about *completely* isolated
 interpreters; most of the time when we want to remove the GIL we want
 a single interpreter with lots of shared data.
 * Your use case, although not common, is not extraordinarily rare
 either.  It'd be nice to support.
 * If CPython had supported it all along we would continue to maintain
 it.
 * However, since it's not supported today, it's not worth the time
 invested, API incompatibility, and general breakage it would imply.
 * Although it's far more work than just solving your problem, if I
 were to remove the GIL I'd go all the way and allow shared objects.


Great recap (although saying it's not about the GIL may cause some
people lose track of the root issues here, but your following comment
GIL removal shows that we're on the same page).

 So there's really only two options here:
 * get a short-term bodge that works, like hacking the 3rd party
 library to use your shared-memory allocator.  Should be far less work
 than hacking all of CPython.

The problem there is that we're not talking about a single 3rd party
API/allocator--there's many, including the OS which has its own
internal allocators.  My video encoding example is meant to illustrate
a point, but the real-world use case is where there's allocators all
over the place from all kinds of APIs, and when you want your C module
to reenter the interpreter often to execute python helper code.

 * invest yourself in solving the *entire* problem (GIL removal with
 shared python objects).

Well, as I mentioned, I do represent a company willing an able to
expend real resources here.  However, as you pointed out, there's some
serious work at hand here (sadly--it didn't have to be this way) and
there seems to be some really polarized people here that don't seem as
interested as I am to make python more attractive for app developers
shopping for an interpreter to embed.

From our point of view, there's two other options which unfortunately
seem to be the only out the more we seem to uncover with this
discussion:

3) Start a new python implementation, let's call it CPythonES,
specifically targeting performance apps and uses an explicit object/
context concept to permit the free threading under discussion here.
The idea would be to just implement the core language, feature set,
and a handful of modules.  I refer you to that list I made earlier of
essential modules.

4) Drop python, switch to Lua.

The interesting thing about (3) is that it'd be in the same spirit as
how OpenGL ES came to be (except in place of the need for free
threading was the fact the standard OpenGL API was too overgrown and
painful for the embedded scale).

We're currently our own in-house version of (3), but we unfortunately
have other priorities at the moment that would otherwise slow this
down.  Given the direction of many-core machines these days, option
(3) or (4), for us, isn't a question of *if*, it's a question of
*when*.  So that's basically where we're at right now.

As to my earlier point about representing a company ready to spend
real resources, please email me off-list if anyone here would have an
interest in an open CPythonES project (and get full compensation).
I can say for sure that we'd be able to lead with API framework design
work--that's my personal strength and we have a lot of real world
experience there.

Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara
On Oct 28, 6:11 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  Because then we're back into the GIL not permitting threads efficient
  core use on CPU bound scripts running on other threads (when they
  otherwise could).

 Why do you think so? For C code that is carefully written, the GIL
 allows *very well* to write CPU bound scripts running on other threads.
 (please do get back to Jesse's original remark in case you have lost
 the thread :-)


I don't follow you there.  If you're referring to multiprocessing, our
concerns are:

- Maturity (am I willing to tell my partners and employees that I'm
betting our future on a brand-new module that imposes significant
restrictions as to how our app operates?)
- Liability (am I ready to invest our resources into lots of new
python module-specific code to find out that a platform that we want
to target isn't supported or has problems?).  Like it not, we're a
company and we have to show sensitivity about new or fringe packages
that make our codebase less agile -- C/C++ continues to win the day in
that department.
- Shared memory -- for the reasons listed in my other posts, IPC or a
shared/mapped memory region doesn't work for our situation (and I
venture to say, for many real world situations otherwise you'd see end-
user/common apps use forking more often than threading).



  It's turns out that this isn't an exotic case
  at all: there's a *ton* of utility gained by making calls back into
  the interpreter. The best example is that since code more easily
  maintained in python than in C, a lot of the module utility code is
  likely to be in python.

 You should really reconsider writing performance-critical code in
 Python.

I don't follow you there...  Performance-critical code in Python??
Suppose you're doing pixel-level filters on images or video, or
Patrick needs to apply a DSP to some audio...  Our app's performance
would *tank*, in a MAJOR way (that, and/or background tasks would take
100x+ longer to do their work).

 Regardless of the issue under discussion, a lot of performance
 can be gained by using flattened data structures, less pointer,
 less reference counting, less objects, and so on - in the inner loops
 of the computation. You didn't reveal what *specific* computation you
 perform, so it's difficult to give specific advise.

I tried to list some abbreviated examples in other posts, but here's
some elaboration:

- Pixel-level effects and filters, where some filters may use C procs
while others may call back into the interpreter to execute logic --
while some do both, multiple times.
- Image and video analysis/recognition where there's TONS of intricate
data structures and logic.  Those data structures and logic are
easiest to develop and maintain in python, but you'll often want to
call back to C procs which will, in turn, want to access Python (as
well as C-level) data structures.

The common pattern here is where there's a serious mix of C and python
code and data structures, BUT it can all be done with a free-thread
mentality since the finish point is unambiguous and distinct -- where
all the results are handed back to the main app in a black and
white handoff.  It's *really* important for an app to freely make
calls into its interpreter (or the interpreter's data structures)
without having to perform lock/unlocking because that affords an app a
*lot* of options and design paths.  It's just not practical to be
locking and locking the GIL when you want to operate on python data
structures or call back into python.

You seem to have placed the burden of proof on my shoulders for an app
to deserve the ability to free-thread when using 3rd party packages,
so how about we just agree it's not an unreasonable desire for a
package (such as python) to support it and move on with the
discussion.


 Again, if you do heavy-lifting in Python, you should consider to rewrite
 the performance-critical parts in C. You may find that the need for
 multiple CPUs goes even away.

Well, the entire premise we're operating under here is that we're
dealing with embarrassingly easy parallelization scenarios, so when
you suggest that the need for multiple CPUs may go away, I'm worried
that you're not keeping the big picture in mind.


  I appreciate your arguments these a PyC concept is a lot of work with
  some careful design work, but let's not kill the discussion just
  because of that.

 Any discussion in this newsgroup is futile, except when it either
 a) leads to a solution that is already possible, and the OP didn't
 envision, or
 b) is followed up by code contributions from one of the participants.

 If neither is likely to result, killing the discussion is the most
 productive thing we can do.


Well, most others here seem to have a lot different definition of what
qualifies as a futile discussion, so how about you allow the rest of
us continue to discuss these issues and possible solutions.  And, for
the record, I've said multiple times I'm ready to 

Re: 2.6, 3.0, and truly independent intepreters

2008-10-30 Thread Andy O'Meara
On Oct 30, 1:00 pm, Jesse Noller [EMAIL PROTECTED] wrote:


 Multiprocessing is written in C, so as for the less agile - I don't
 see how it's any less agile then what you've talked about.

Sorry for not being more specific there, but by less agile I meant
that an app's codebase is less agile if python is an absolute
requirement.  If I was told tomorrow that for some reason we had to
drop python and go with something else, it's my job to have chosen a
codebase path/roadmap such that my response back isn't just well,
we're screwed then.  Consider modern PC games.  They have huge code
bases that use DirectX and OpenGL and having a roadmap of flexibility
is paramount so packages they choose to use are used in a contained
and hedged fashion.  It's a survival tactic for a company not to
entrench themselves in a package or technology if they don't have to
(and that's what I keep trying to raise in the thread--that the python
dev community should embrace development that makes python a leading
candidate for lightweight use).  Companies want to build a flexible,
powerful codebases that are married to as few components as
possible.


  - Shared memory -- for the reasons listed in my other posts, IPC or a
  shared/mapped memory region doesn't work for our situation (and I
  venture to say, for many real world situations otherwise you'd see end-
  user/common apps use forking more often than threading).

 I would argue that the reason most people use threads as opposed to
 processes is simply based on ease of use and entry (which is ironic,
 given how many problems it causes).

No, we're in agreement here -- I was just trying to offer a more
detailed explanation of ease of use.  It's easy because memory is
shared and no IPC, serialization, or special allocator code is
required.  And as we both agree, it's far from easy once those
threads to interact with each other.  But again, my goal here is to
stay on the embarrassingly easy parallelization scenarios.



 I would argue that most of the people taking part in this discussion
 are working on real world applications - sure, multiprocessing as it
 exists today, right now - may not support your use case, but it was
 evaluated to fit *many* use cases.

And as I've mentioned, it's a totally great endeavor to be super proud
of.  That suite of functionality alone opens some *huge* doors for
python and I hope folks that use it appreciate how much time and
thought that undoubtably had to go into it.  You get total props, for
sure, and you're work is a huge and unique credit to the community.


 Please correct me if I am wrong in understanding what you want: You
 are making threads in another language (not via the threading API),
 embed python in those threads, but you want to be able to share
 objects/state between those threads, and independent interpreters. You
 want to be able to pass state from one interpreter to another via
 shared memory (e.g. pointers/contexts/etc).

 Example:

 ParentAppFoo makes 10 threads (in C)
 Each thread gets an itty bitty python interpreter
 ParentAppFoo gets a object(video) to render
 Rather then marshal that object, you pass a pointer to the object to
 the children
 You want to pass that pointer to an existing, or newly created itty
 bitty python interpreter for mangling
 Itty bitty python interpreter passes the object back to a C module via
 a pointer/context

 If the above is wrong, I think possible outlining it in the above form
 may help people conceptualize it - I really don't think you're talking
 about python-level processes or threads.


Yeah, you have it right-on there, with added fact that the C and
python execution (and data access) are highly intertwined (so getting
and releasing the GIL would have to be happening all over).  For
example, consider and the dynamics, logic, algorithms, and data
structures associated with image and video effects and image and video
image recognition/analysis.


Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Andy O'Meara
On Oct 26, 10:11 pm, James Mills [EMAIL PROTECTED]
wrote:
 On Mon, Oct 27, 2008 at 12:03 PM, Andy O'Meara [EMAIL PROTECTED] wrote:
  I think we miscommunicated there--I'm actually agreeing with you.  I
  was trying to make the same point you were: that intricate and/or
  large structures are meant to be passed around by a top-level pointer,
  not using and serialization/messaging.  This is what I've been trying
  to explain to others here; that IPC and shared memory unfortunately
  aren't viable options, leaving app threads (rather than child
  processes) as the solution.

 Andy,

 Why don't you just use a temporary file
 system (ram disk) to store the data that
 your app is manipulating. All you need to
 pass around then is a file descriptor.

 --JamesMills

Unfortunately, it's the penalty of serialization and unserialization.
When you're talking about stuff like memory-resident images and video
(complete with their intricate and complex codecs), then the only
option is to be passing around a couple pointers rather then take the
hit of serialization (which is huge for video, for example).  I've
gone into more detail in some other posts but I could have missed
something.


Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Andy O'Meara
On Oct 27, 4:05 am, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:



  Well, when you're talking about large, intricate data structures
  (which include opaque OS object refs that use process-associated
  allocators), even a shared memory region between the child process and
  the parent can't do the job.  Otherwise, please describe in detail how
  I'd get an opaque OS object (e.g. an OS ref that refers to memory-
  resident video) from the child process back to the parent process.

 WHAT PARENT PROCESS? In the same address space, to me, means
 a single process only, not multiple processes, and no parent process
 anywhere. If you have just multiple threads, the notion of passing
 data from a child process back to the parent process is
 meaningless.

I know...  I was just responding to you and others here keep beating
the fork drum.  I just trying make it clear that a shared address
space is the only way to go.  Ok, good, so we're in agreement that
threads is the only way to deal with the intricate and complex data
set issue in a performance-centric application.


  Again, the big picture that I'm trying to plant here is that there
  really is a serious need for truly independent interpreters/contexts
  in a shared address space.

 I understand that this is your mission in this thread. However, why
 is that your problem? Why can't you just use the existing (limited)
 multiple-interpreters machinery, and solve your problems with that?

Because then we're back into the GIL not permitting threads efficient
core use on CPU bound scripts running on other threads (when they
otherwise could).  Just so we're on the same page, when they
otherwise could is relevant here because that's the important given:
that each interpreter (context) truly never has any context with
others.

An example would be python scripts that generate video programatically
using an initial set of params and use an in-house C module to
construct frame (which in turn make and modify python C objects that
wrap to intricate codec related data structures).  Suppose you wanted
to render 3 of these at the same time, one on each thread (3
threads).  With the GIL in place, these threads can't anywhere close
to their potential.  Your response thus far is that the C module
should release the GIL before it commences its heavy lifting.  Well,
the problem is that if during its heavy lifting it needs to call back
into its interpreter.  It's turns out that this isn't an exotic case
at all: there's a *ton* of utility gained by making calls back into
the interpreter. The best example is that since code more easily
maintained in python than in C, a lot of the module utility code is
likely to be in python.  Unsurprisingly, this is the situation myself
and many others are in: where we want to subsequently use the
interpreter within the C module (so, as I understand it, the proposal
to have the C module release the GIL unfortunately doesn't work as a
general solution).


  For most
  industry-caliber packages, the expectation and convention (unless
  documented otherwise) is that the app can make as many contexts as its
  wants in whatever threads it wants because the convention is that the
  app is must (a) never use one context's objects in another context,
  and (b) never use a context at the same time from more than one
  thread.  That's all I'm really trying to look at here.

 And that's indeed the case for Python, too. The app can make as many
 subinterpreters as it wants to, and it must not pass objects from one
 subinterpreter to another one, nor should it use a single interpreter
 from more than one thread (although that is actually supported by
 Python - but it surely won't hurt if you restrict yourself to a single
 thread per interpreter).


I'm not following you there...  I thought we're all in agreement that
the existing C modules are FAR from being reentrant, regularly making
use of static/global objects. The point I had made before is that
other industry-caliber packages specifically don't have restrictions
in *any* way.

I appreciate your arguments these a PyC concept is a lot of work with
some careful design work, but let's not kill the discussion just
because of that.  The fact remains that the video encoding scenario
described above is a pretty reasonable situation, and as more people
are commenting in this thread, there's an increasing need to offer
apps more flexibility when it comes to multi-threaded use.


Andy




--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Andy O'Meara
On Oct 25, 9:46 am, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 These discussion pop up every year or so and I think that most of them
 are not really all that necessary, since the GIL isn't all that bad.


Thing is, if the topic keeps coming up, then that may be an indicator
that change is truly needed.  Someone much wiser than me once shared
that a measure of the usefulness and quality of a package (or API) is
how easily it can be added to an application--of any flavors--without
the application needing to change.

So in the rising world of idle cores and worker threads, I do see an
increasing concern over the GIL.  Although I recognize that the debate
is lengthy, heated, and has strong arguments on both sides, my reading
on the issue makes me feel like there's a bias for the pro-GIL side
because of the volume of design and coding work associated with
considering various alternatives (such as Glenn's Py* concepts).
And I DO respect and appreciate where the pro-GIL people come from:
who the heck wants to do all that work and recoding so that a tiny
percent of developers can benefit?  And my best response is that as
unfortunate as it is, python needs to be more multi-threaded app-
friendly if we hope to attract the next generation of app developers
that want to just drop python into their app (and not have to change
their app around python).  For example, Lua has that property, as
evidenced by its rapidly growing presence in commercial software
(Blizzard uses it heavily, for example).


 Furthermore, there are lots of ways to tune the CPython VM to make
 it more or less responsive to thread switches via the various sys.set*()
 functions in the sys module.

 Most computing or I/O intense C extensions, built-in modules and object
 implementations already release the GIL for you, so it usually doesn't
 get in the way all that often.


The main issue I take there is that it's often highly useful for C
modules to make subsequent calls back into the interpreter. I suppose
the response to that is to call the GIL before reentry, but it just
seems to be more code and responsibility in scenarios where it's no
necessary.  Although that code and protocol may come easy to veteran
CPython developers, let's not forget that an important goal is to
attract new developers and companies to the scene, where they get
their thread-independent code up and running using python without any
unexpected reengineering.  Again, why are companies choosing Lua over
Python when it comes to an easy and flexible drop-in interpreter?  And
please take my points here to be exploratory, and not hostile or
accusatory, in nature.


Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-28 Thread Andy O'Meara
On Oct 27, 10:55 pm, Glenn Linderman [EMAIL PROTECTED] wrote:


 And I think we still are miscommunicating!  Or maybe communicating anyway!

 So when you said object, I actually don't know whether you meant
 Python object or something else.  I assumed Python object, which may not
 have been correct... but read on, I think the stuff below clears it up.


 Then when you mentioned thousands of objects, I imagined thousands of
 Python objects, and somehow transforming the blob into same... and back
 again.  

My apologies to you and others here on my use of objects -- I'm use
the term generically and mean it to *not* refer to python objects (for
the all the reasons discussed here).  Python only makes up a small
part of our app, hence my habit of objects to refer to other APIs'
allocated and opaque objects (including our own and OS APIs).  For all
the reasons we've discussed, in our world, python objects don't travel
around outside of our python C modules -- when python objects need to
be passed to other parts of the app, they're converted into their non-
python (portable) equivalents (ints, floats, buffers, etc--but most of
the time, the objects are PyCObjects, so they can enter and leave a
python context with negligible overhead). I venture to say this is
pretty standard when any industry app uses a package (such as python),
for various reasons:
   - Portability/Future (e.g.  if we do decode to drop Python and go
with Lua, the changes are limited to only one region of code).
   - Sanity (having any API's objects show up in places far away
goes against easy-to-follow code).
   - MT flexibility (because we always never use static/global
storage, we have all kinds of options when it comes to
multithreading).  For example, recall that by throwing python in
multiple dynamic libs, we were able to achieve the GIL-less
interpreter independence that we want (albeit ghetto and a pain).



Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara

Grrr... I posted a ton of lengthy replies to you and other recent
posts here using Google and none of them made it, argh. Poof. There's
nothing that fires more up more than lost work,  so I'll have to
revert short and simple answers for the time being.  Argh, damn.


On Oct 25, 1:26 am, greg [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:
  I would definitely agree if there was a context (i.e. environment)
  object passed around then perhaps we'd have the best of all worlds.

 Moreover, I think this is probably the *only* way that
 totally independent interpreters could be realized.

 Converting the whole C API to use this strategy would be
 a very big project. Also, on the face of it, it seems like
 it would render all existing C extension code obsolete,
 although it might be possible to do something clever with
 macros to create a compatibility layer.

 Another thing to consider is that passing all these extra
 pointers around everywhere is bound to have some effect
 on performance.


I'm with you on all counts, so no disagreement there.  On the passing
a ptr everywhere issue, perhaps one idea is that all objects could
have an additional field that would point back to their parent context
(ie. their interpreter).  So the only prototypes that would have to be
modified to contain the context ptr would be the ones that don't
inherently operate on objects (e.g. importing a module).


On Oct 25, 1:54 am, greg [EMAIL PROTECTED] wrote:
 Andy O'Meara wrote:
  - each worker thread makes its own interpreter, pops scripts off a
  work queue, and manages exporting (and then importing) result data to
  other parts of the app.

 I hope you realize that starting up one of these interpreters
 is going to be fairly expensive. It will have to create its
 own versions of all the builtin constants and type objects,
 and import its own copy of all the modules it uses.


Yeah, for sure. And I'd say that's a pretty well established
convention already out there for any industry package.  The pattern
I'd expect to see is where the app starts worker threads, starts
interpreters in one or more of each, and throws jobs to different ones
(and the interpreter would persist to move on to subsequent jobs).

 One wonders if it wouldn't be cheaper just to fork the
 process. Shared memory can be used to transfer large lumps
 of data if needed.


As I mentioned, wen you're talking about intricate data structures, OS
opaque objects (ie. that have their own internal allocators), or huge
data sets, even a shared memory region unfortunately  can't fit the
bill.


Andy
--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  A c-level module, on the other hand, can sidestep/release
  the GIL at will, and go on it's merry way and process away.

  ...Unless part of the C module execution involves the need do CPU-
  bound work on another thread through a different python interpreter,
  right?

 Wrong.


Let's take a step back and remind ourselves of the big picture.  The
goal is to have independent interpreters running in pthreads that the
app starts and controls.  Each interpreter never at any point is doing
any thread-related stuff in any way.  For example, each script job
just does meat an potatoes CPU work, using callbacks that, say,
programatically use OS APIs to edit and transform frame data.

So I think the disconnect here is that maybe you're envisioning
threads being created *in* python.  To be clear, we're talking out
making threads at the app level and making it a given for the app to
take its safety in its own hands.




  As far as I can tell, it seems
  CPython's current state can't CPU bound parallelization in the same
  address space.

 That's not true.


Well, when you're talking about large, intricate data structures
(which include opaque OS object refs that use process-associated
allocators), even a shared memory region between the child process and
the parent can't do the job.  Otherwise, please describe in detail how
I'd get an opaque OS object (e.g. an OS ref that refers to memory-
resident video) from the child process back to the parent process.

Again, the big picture that I'm trying to plant here is that there
really is a serious need for truly independent interpreters/contexts
in a shared address space.  Consider stuff like libpng, zlib, ipgjpg,
or whatever, the use pattern is always the same: make a context
object, do your work in the context, and take it down.  For most
industry-caliber packages, the expectation and convention (unless
documented otherwise) is that the app can make as many contexts as its
wants in whatever threads it wants because the convention is that the
app is must (a) never use one context's objects in another context,
and (b) never use a context at the same time from more than one
thread.  That's all I'm really trying to look at here.


Andy




--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-26 Thread Andy O'Meara


  And in the case of hundreds of megs of data

 ... and I would be surprised at someone that would embed hundreds of
 megs of data into an object such that it had to be serialized... seems
 like the proper design is to point at the data, or a subset of it, in a
 big buffer.  Then data transfers would just transfer the offset/length
 and the reference to the buffer.

  and/or thousands of data structure instances,

 ... and this is another surprise!  You have thousands of objects (data
 structure instances) to move from one thread to another?


I think we miscommunicated there--I'm actually agreeing with you.  I
was trying to make the same point you were: that intricate and/or
large structures are meant to be passed around by a top-level pointer,
not using and serialization/messaging.  This is what I've been trying
to explain to others here; that IPC and shared memory unfortunately
aren't viable options, leaving app threads (rather than child
processes) as the solution.


 Of course, I know that data get large, but typical multimedia streams
 are large, binary blobs.  I was under the impression that processing
 them usually proceeds along the lines of keeping offsets into the blobs,
 and interpreting, etc.  Editing is usually done by making a copy of a
 blob, transforming it or a subset in some manner during the copy
 process, resulting in a new, possibly different-sized blob.


Your instincts are right.  I'd only add on that when you're talking
about data structures associated with an intricate video format, the
complexity and depth of the data structures is insane -- the LAST
thing you want to burn cycles on is serializing and unserializing that
stuff (so IPC is out)--again, we're already on the same page here.

I think at one point you made the comment that shared memory is a
solution to handle large data sets between a child process and the
parent.  Although this is certainty true in principle, it doesn't hold
up in practice since complex data structures often contain 3rd party
and OS API objects that have their own allocators.  For example, in
video encoding, there's TONS of objects that comprise memory-resident
video from all kinds of APIs, so the idea of having them allocated
from shared/mapped memory block isn't even possible. Again, I only
raise this to offer evidence that doing real-world work in a child
process is a deal breaker--a shared address space is just way too much
to give up.


Andy
--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Andy O'Meara
On Oct 24, 9:52 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  A c-level module, on the other hand, can sidestep/release
  the GIL at will, and go on it's merry way and process away.

  ...Unless part of the C module execution involves the need do CPU-
  bound work on another thread through a different python interpreter,
  right?

 Wrong.

  (even if the interpreter is 100% independent, yikes).

 Again, wrong.

  For
  example, have a python C module designed to programmatically generate
  images (and video frames) in RAM for immediate and subsequent use in
  animation.  Meanwhile, we'd like to have a pthread with its own
  interpreter with an instance of this module and have it dequeue jobs
  as they come in (in fact, there'd be one of these threads for each
  excess core present on the machine).

 I don't understand how this example involves multiple threads. You
 mention a single thread (running the module), and you mention designing
 a  module. Where is the second thread?

Glenn seems to be following me here...  The point is to have any many
threads as the app wants, each in it's own world, running without
restriction (performance wise).  Maybe the app wants to run a thread
for each extra core on the machine.

Perhaps the disconnect here is that when I've been saying start a
thread, I mean the app starts an OS thread (e.g. pthread) with the
given that any contact with other threads is managed at the app level
(as opposed to starting threads through python).  So, as far as python
knows, there's zero mention or use of threading in any way,
*anywhere*.


  As far as I can tell, it seems
  CPython's current state can't CPU bound parallelization in the same
  address space.

 That's not true.


Um...  So let's say you have a opaque object ref from the OS that
represents hundreds of megs of data (e.g. memory-resident video).  How
do you get that back to the parent process without serialization and
IPC?  What should really happen is just use the same address space so
just a pointer changes hands.  THAT's why I'm saying that a separate
address space is  generally a deal breaker when you have large or
intricate data sets (ie. when performance matters).

Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Andy O'Meara
On Oct 24, 9:40 pm, Martin v. Löwis [EMAIL PROTECTED] wrote:
  It seems to me that the very simplest move would be to remove global
  static data so the app could provide all thread-related data, which
  Andy suggests through references to the QuickTime API. This would
  suggest compiling python without thread support so as to leave it up
  to the application.

 I'm not sure whether you realize that this is not simple at all.
 Consider this fragment

     if (string == Py_None || index = state-lastmark ||
 !state-mark[index] || !state-mark[index+1]) {
         if (empty)
             /* want empty string */
             i = j = 0;
         else {
             Py_INCREF(Py_None);
             return Py_None;



The way to think about is that, ideally in PyC, there are never any
global variables.  Instead, all globals are now part of a context
(ie. a interpreter) and it would presumably be illegal to ever use
them in a different context. I'd say this is already the expectation
and convention for any modern, industry-grade software package
marketed as extension for apps.  Industry app developers just want to
drop in a 3rd party package, make as many contexts as they want (in as
many threads as they want), and expect to use each context without
restriction (since they're ensuring contexts never interact with each
other).  For example, if I use zlib, libpng, or libjpg, I can make as
many contexts as I want and put them in whatever threads I want.  In
the app, the only thing I'm on the hook for is to: (a) never use
objects from one context in another context, and (b) ensure that I'm
never make any calls into a module from more than one thread at the
same time.  Both of these requirements are trivial to follow in the
embarrassingly easy parallelization scenarios, and that's why I
started this thread in the first place.  :^)

Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Andy O'Meara
On Oct 24, 10:24 pm, Glenn Linderman [EMAIL PROTECTED] wrote:

  And in the case of hundreds of megs of data

 ... and I would be surprised at someone that would embed hundreds of
 megs of data into an object such that it had to be serialized... seems
 like the proper design is to point at the data, or a subset of it, in a
 big buffer.  Then data transfers would just transfer the offset/length
 and the reference to the buffer.

  and/or thousands of data structure instances,

 ... and this is another surprise!  You have thousands of objects (data
 structure instances) to move from one thread to another?

Heh, no, we're actually in agreement here.  I'm saying that in the
case where the data sets are large and/or intricate, a single top-
level pointer changing hands is *always* the way to go rather than
serialization.  For example, suppose you had some nifty python code
and C procs that were doing lots of image analysis, outputting tons of
intricate and rich data structures.  Once the thread is done with that
job, all that output is trivially transferred back to the appropriate
thread by a pointer changing hands.


 Of course, I know that data get large, but typical multimedia streams
 are large, binary blobs.  I was under the impression that processing
 them usually proceeds along the lines of keeping offsets into the blobs,
 and interpreting, etc.  Editing is usually done by making a copy of a
 blob, transforming it or a subset in some manner during the copy
 process, resulting in a new, possibly different-sized blob.

No, you're definitely right-on, with the the additional point that the
representation of multimedia usually employs intricate and diverse
data structures (imagine the data structure representation of a movie
encoded in modern codec, such as H.264, complete with paths, regions,
pixel flow, geometry, transformations, and textures).  As we both
agree, that's something that you *definitely* want to move around via
a single pointer (and not in a serialized form).  Hence, my position
that apps that use python can't be forced to go through IPC or else:
(a) there's a performance/resource waste to serialize and unserialize
large or intricate data sets, and (b) they're required to write and
maintain serialization code that otherwise doesn't serve any other
purpose.

Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-25 Thread Andy O'Meara

 Andy O'Meara wrote:
  I would definitely agree if there was a context (i.e. environment)
  object passed around then perhaps we'd have the best of all worlds.

 Moreover, I think this is probably the *only* way that
 totally independent interpreters could be realized.

 Converting the whole C API to use this strategy would be
 a very big project. Also, on the face of it, it seems like
 it would render all existing C extension code obsolete,
 although it might be possible to do something clever with
 macros to create a compatibility layer.

 Another thing to consider is that passing all these extra
 pointers around everywhere is bound to have some effect
 on performance.


Good points--I would agree with you on all counts there.  On the
passing a context everywhere performance hit, perhaps one idea is
that all objects could have an additional field that would point back
to their parent context (ie. their interpreter).  So the only
prototypes that would have to be modified to contain the context ptr
would be the ones that inherently don't take any objects. This would
conveniently and generally correspond to procs associated with
interpreter control (e.g. importing modules, shutting down modules,
etc).


 Andy O'Meara wrote:
  - each worker thread makes its own interpreter, pops scripts off a
  work queue, and manages exporting (and then importing) result data to
  other parts of the app.

 I hope you realize that starting up one of these interpreters
 is going to be fairly expensive.

Absolutely.  I had just left that issue out in an effort to keep the
discussion pointed, but it's a great point to raise.  My response is
that, like any 3rd party industry package, I'd say this is the
expectation (that context startup and shutdown is non-trivial and to
should be minimized for performance reasons).  For simplicity, my
examples didn't talk about this issue but in practice, it'd be typical
for apps to have their worker interpreters persist as they chew
through jobs.


Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara
On Oct 24, 9:35 am, sturlamolden [EMAIL PROTECTED] wrote:
 Instead of appdomains (one interpreter per thread), or free
 threading, you could use multiple processes. Take a look at the new
 multiprocessing module in Python 2.6.

That's mentioned earlier in the thread.


 There is a fundamental problem with using homebrew loading of multiple
 (but renamed) copies of PythonXX.dll that is easily overlooked. That
 is, extension modules (.pyd) are DLLs as well.

Tell me about it--there's all kinds of problems and maintenance
liabilities with our approach.  That's why I'm here talking about this
stuff.

 There are other options as well:

 - Use IronPython. It does not have a GIL.

 - Use Jython. It does not have a GIL.

 - Use pywin32 to create isolated outproc COM servers in Python. (I'm
 not sure what the effect of inproc servers would be.)

 - Use os.fork() if your platform supports it (Linux, Unix, Apple,
 Cygwin, Windows Vista SUA). This is the standard posix way of doing
 multiprocessing. It is almost unbeatable if you have a fast copy-on-
 write implementation of fork (that is, all platforms except Cygwin).

This is discussed earlier in the thread--they're unfortunately all
out.

--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara
On Oct 24, 2:12 am, greg [EMAIL PROTECTED] wrote:
 Andy wrote:
  1) Independent interpreters (this is the easier one--and solved, in
  principle anyway, by PEP 3121, by Martin v. Löwis

 Something like that is necessary for independent interpreters,
 but not sufficient. There are also all the built-in constants
 and type objects to consider. Most of these are statically
 allocated at the moment.


Agreed--I  was just trying to speak generally.  Or, put another way,
there's no hope for independent interpreters without the likes of PEP
3121.  Also, as Martin pointed out, there's the issue of module
cleanup some guys here may underestimate (and I'm glad Martin pointed
out the importance of it).  Without the module cleanup, every time a
dynamic library using python loads and unloads you've got leaks.  This
issue is a real problem for us since our software is loaded and
unloaded many many times in a host app (iTunes, WMP, etc).  I hadn't
raised it here yet (and I don't want to turn the discussion to this),
but lack of multiple load and unload support has been another painful
issue that we didn't expect to encounter when we went with python.


  2) Barriers to free threading.  As Jesse describes, this is simply
  just the GIL being in place, but of course it's there for a reason.
  It's there because (1) doesn't hold and there was never any specs/
  guidance put forward about what should and shouldn't be done in multi-
  threaded apps

 No, it's there because it's necessary for acceptable performance
 when multiple threads are running in one interpreter. Independent
 interpreters wouldn't mean the absence of a GIL; it would only
 mean each interpreter having its own GIL.


I see what you're saying, but let's note that what you're talking
about at this point is an interpreter containing protection from the
client level violating (supposed) direction put forth in python
multithreaded guidelines.  Glenn Linderman's post really gets at
what's at hand here.  It's really important to consider that it's not
a given that python (or any framework) has to be designed against
hazardous use.  Again, I refer you to the diagrams and guidelines in
the QuickTime API:

http://developer.apple.com/technotes/tn/tn2125.html

They tell you point-blank what you can and can't do, and it's that's
simple.  Their engineers can then simply create the implementation
around those specs and not weigh any of the implementation down with
sync mechanisms.  I'm in the camp that simplicity and convention wins
the day when it comes to an API.  It's safe to say that software
engineers expect and assume that a thread that doesn't have contact
with other threads (except for explicit, controlled message/object
passing) will run unhindered and safely, so I raise an eyebrow at the
GIL (or any internal helper sync stuff) holding up an thread's
performance when the app is designed to not need lower-level global
locks.

Anyway, let's talk about solutions.  My company looking to support
python dev community endeavor that allows the following:

- an app makes N worker threads (using the OS)

- each worker thread makes its own interpreter, pops scripts off a
work queue, and manages exporting (and then importing) result data to
other parts of the app.  Generally, we're talking about CPU-bound work
here.

- each interpreter has the essentials (e.g. math support, string
support, re support, and so on -- I realize this is open-ended, but
work with me here).

Let's guesstimate about what kind of work we're talking about here and
if this is even in the realm of possibility.  If we find that it *is*
possible, let's figure out what level of work we're talking about.
From there, I can get serious about writing up a PEP/spec, paid
support, and so on.

Regards,
Andy





--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara


 That aside, the fundamental problem is what I perceive a fundamental
 design flaw in Python's C API. In Java JNI, each function takes a
 JNIEnv* pointer as their first argument. There  is nothing the
 prevents you from embedding several JVMs in a process. Python can
 create embedded subinterpreters, but it works differently. It swaps
 subinterpreters like a finite state machine: only one is concurrently
 active, and the GIL is shared.

Bingo, it seems that you've hit it right on the head there.  Sadly,
that's why I regard this thread largely futile (but I'm an optimist
when it comes to cool software communities so here I am).  I've been
afraid to say it for fear of getting mauled by everyone here, but I
would definitely agree if there was a context (i.e. environment)
object passed around then perhaps we'd have the best of all worlds.
*winces*



  This is discussed earlier in the thread--they're unfortunately all
  out.

 It occurs to me that tcl is doing what you want. Have you ever thought
 of not using Python?

Bingo again.  Our research says that the options are tcl, perl
(although it's generally untested and not recommended by the
community--definitely dealbreakers for a commercial user like us), and
lua.  Also, I'd rather saw off my own right arm than adopt perl, so
that's out.  :^)

As I mentioned, we're looking to either (1) support a python dev
community effort, (2) make our own high-performance python interpreter
(that uses an env object as you described), or (3) drop python and go
to lua.  I'm favoring them in the order I list them, but the more I
discuss the issue with folks here, the more people seem to be
unfortunately very divided on (1).

Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara

Glenn, great post and points!


 Andy seems to want an implementation of independent Python processes
 implemented as threads within a single address space, that can be
 coordinated by an outer application.  This actually corresponds to the
 model promulgated in the paper as being most likely to succeed.

Yeah, that's the idea--let the highest levels run and coordinate the
show.


 It does seem simpler and more efficient to simply copy
 data from one memory location to another, rather than send it in a
 message, especially if the data are large.

That's the rub...  In our case, we're doing image and video
manipulation--stuff not good to be messaging from address space to
address space.  The same argument holds for numerical processing with
large data sets.  The workers handing back huge data sets via
messaging isn't very attractive.

 One thing Andy hasn't yet explained (or I missed) is why any of his
 application is coded in a language other than Python.  

Our software runs in real time (so performance is paramount),
interacts with other static libraries, depends on worker threads to
perform real-time image manipulation, and leverages Windows and Mac OS
API concepts and features.  Python's performance hits have generally
been a huge challenge with our animators because they often have to go
back and massage their python code to improve execution performance.
So, in short, there are many reasons why we use python as a part
rather than a whole.

The other area of pain that I mentioned in one of my other posts is
that what we ship, above all, can't be flaky.  The lack of module
cleanup (intended to be addressed by PEP 3121), using a duplicate copy
of the python dynamic lib, and namespace black magic to achieve
independent interpreters are all examples that have made using python
for us much more challenging and time-consuming then we ever
anticipated.

Again, if it turns out nothing can be done about our needs (which
appears to be more and more like the case), I think it's important for
everyone here to consider the points raised here in the last week.
Moreover, realize that the python dev community really stands to gain
from making python usable as a tool (rather than a monolith).  This
fact alone has caused lua to *rapidly* rise in popularity with
software companies looking to embed a powerful, lightweight
interpreter in their software.

As a python language fan an enthusiast, don't let lua win!  (I say
this endearingly of course--I have the utmost respect for both
communities and I only want to see CPython be an attractive pick when
a company is looking to embed a language that won't intrude upon their
app's design).


Andy
--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara



 The Global Interpreter Lock is fundamentally designed to make the
 interpreter easier to maintain and safer: Developers do not need to
 worry about other code stepping on their namespace. This makes things
 thread-safe, inasmuch as having multiple PThreads within the same
 interpreter space modifying global state and variable at once is,
 well, bad. A c-level module, on the other hand, can sidestep/release
 the GIL at will, and go on it's merry way and process away.

...Unless part of the C module execution involves the need do CPU-
bound work on another thread through a different python interpreter,
right? (even if the interpreter is 100% independent, yikes).  For
example, have a python C module designed to programmatically generate
images (and video frames) in RAM for immediate and subsequent use in
animation.  Meanwhile, we'd like to have a pthread with its own
interpreter with an instance of this module and have it dequeue jobs
as they come in (in fact, there'd be one of these threads for each
excess core present on the machine).  As far as I can tell, it seems
CPython's current state can't CPU bound parallelization in the same
address space (basically, it seems that we're talking about the
embarrassingly parallel scenario raised in that paper).  Why does it
have to be in same address space?  Convenience and simplicity--the
same reasons that most APIs let you hang yourself if the app does dumb
things with threads.  Also, when the data sets that you need to send
to and from each process is large, using the same address space makes
more and more sense.


 So, just to clarify - Andy, do you want one interpreter, $N threads
 (e.g. PThreads) or the ability to fork multiple heavyweight
 processes?

Sorry if I haven't been clear, but we're talking the app starting a
pthread, making a fresh/clean/independent interpreter, and then being
responsible for its safety at the highest level (with the payoff of
each of these threads executing without hinderance).  No different
than if you used most APIs out there where step 1 is always to make
and init a context object and the final step is always to destroy/take-
down that context object.

I'm a lousy writer sometimes, but I feel bad if you took the time to
describe threads vs processes.  The only reason I raised IPC with my
messaging isn't very attractive comment was to respond to Glenn
Linderman's points regarding tradeoffs of shared memory vs no.


Andy



--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara

Another great post, Glenn!!  Very well laid-out and posed!! Thanks for
taking the time to lay all that out.


 Questions for Andy: is the type of work you want to do in independent
 threads mostly pure Python? Or with libraries that you can control to
 some extent? Are those libraries reentrant? Could they be made
 reentrant? How much of the Python standard library would need to be
 available in reentrant mode to provide useful functionality for those
 threads? I think you want PyC


I think you've defined everything perfectly, and you're you're of
course correct about my love for for the PyC model.  :^)

Like any software that's meant to be used without restrictions, our
code and frameworks always use a context object pattern so that
there's never and non-const global/shared data).  I would go as far to
say that this is the case with more performance-oriented software than
you may think since it's usually a given for us to have to be parallel
friendly in as many ways as possible.  Perhaps Patrick can back me up
there.

As to what modules are essential...  As you point out, once
reentrant module implementations caught on in PyC or hybrid world, I
think we'd start to see real effort to whip them into compliance--
there's just so much to be gained imho.  But to answer the question,
there's the obvious ones (operator, math, etc), string/buffer
processing (string, re), C bridge stuff (struct, array), and OS basics
(time, file system, etc).  Nice-to-haves would be buffer and image
decompression (zlib, libpng, etc), crypto modules, and xml. As far as
I can imagine, I have to believe all of these modules already contain
little, if any, global data, so I have to believe they'd be super easy
to make PyC happy.  Patrick, what would you see you guys using?


  That's the rub...  In our case, we're doing image and video
  manipulation--stuff not good to be messaging from address space to
  address space.  The same argument holds for numerical processing with
  large data sets.  The workers handing back huge data sets via
  messaging isn't very attractive.

 In the module multiprocessing environment could you not use shared
 memory, then, for the large shared data items?


As I understand things, the multiprocessing puts stuff in a child
process (i.e. a separate address space), so the only to get stuff to/
from it is via IPC, which can include a shared/mapped memory region.
Unfortunately, a shared address region doesn't work when you have
large and opaque objects (e.g. a rendered CoreVideo movie in the
QuickTime API or 300 megs of audio data that just went through a
DSP).  Then you've got the hit of serialization if you're got
intricate data structures (that would normally would need to be
serialized, such as a hashtable or something).  Also, if I may speak
for commercial developers out there who are just looking to get the
job done without new code, it's usually always preferable to just a
single high level sync object (for when the job is complete) than to
start a child processes and use IPC.  The former is just WAY less
code, plain and simple.


Andy


--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Andy O'Meara


 Are you familiar with the API at all? Multiprocessing was designed to
 mimic threading in about every way possible, the only restriction on
 shared data is that it must be serializable, but event then you can
 override or customize the behavior.

 Also, inter process communication is done via pipes. It can also be
 done with messages if you want to tweak the manager(s).


I apologize in advance if I don't understand something correctly, but
as I understand them, everything has to be serialized in order to go
through IPC.  So when you're talking about thousands of objects,
buffers, and/or large OS opaque objects (e.g. memory-resident video
and images), that seems like a pretty rough hit of run-time resources.

Please don't misunderstand my comments to suggest that multiprocessing
isn't great stuff.  On the contrary, it's very impressive and it
singlehandedly catapults python *way* closer to efficient CPU bound
processing than it ever was before.  All I mean to say is that in the
case where using a shared address space with a worker pthread per
spare core to do CPU bound work, it's a really big win not to have to
serialize stuff.  And in the case of hundreds of megs of data and/or
thousands of data structure instances, it's a deal breaker to
serialize and unserialize everything just so that it can be sent
though IPC.  It's a deal breaker for most performance-centric apps
because of the unnecessary runtime resource hit and because now all
those data structures being passed around have to have accompanying
serialization code written (and maintained) for them.   That's
actually what I meant when I made the comment that a high level sync
object in a shared address space is better then sending it all
through IPC (when the data sets are wild and crazy).  From a C/C++
point of view, I would venture to say that it's always a huge win to
just stick those embarrassingly easy parallelization cases into the
thread with a sync object than forking and using IPC and having to
write all the serialization code. And in the case of huge data types--
such as video or image rendering--it makes me nervous to think of
serializing it all just so it can go through IPC when it could just be
passed using a pointer change and a single sync object.

So, if I'm missing something and there's a way so pass data structures
without serialization, then I'd definitely like to learn more (sorry
in advance if I missed something there).  When I took a look at
multiprocessing my concerns where:
   - serialization (discussed above)
   - maturity (are we ready to bet the farm that mp is going to work
properly on the platforms we need it to?)

Again, I'm psyched that multiprocessing appeared in 2.6 and it's a
huge huge step in getting everyone to unlock the power of python!
But, then some of the tidbits described above are additional data
points for you and others to chew on.  I can tell you they're pretty
important points for any performance-centric software provider (us,
game developers--from EA to Ambrosia, and A/V production app
developers like Patrick).

Andy










--
http://mail.python.org/mailman/listinfo/python-list