I have been thinking about the multithreading problem in PyPy for a while and 
I have come up with an idea. I'd like to have feedback from people who know 
the codebase well.

The first and hardest step is to change the PyPy runtime so that it can run 
multiple threads at the same time. To simplify matters, we allocate all 
external resources to one thread to start with. We assume that other threads 
don't use them. Neither do they call into extension modules that do messy 
things.

Whenever we spawn a new thread, we give it its own object space and its own 
instance of the memory manager/garbage collector. 

Having gotten this far, we would have N threads that could run in parallell. 
Since they have no interaction with each other and no contetion for resources, 
they would require no locking mechanism. The thread with the external 
resources would still be dependent on the GIL, but the other ones wouldn't 
even see it.

This setup would of course be utterly useless, because all but one of the 
threads would have no means of comunicationg their results to the world.

So, in a second step, we provide for special data types that can be shared 
between threads. These would typically be allocated in non-movable memory, to 
avoid the complexity of garbage collection of memory with shared use. You can 
make simple fifo structures for communication between the threads and complex 
structures with advanced algorithms for dealing with shared access.

In a third step, you may relax the requirement that the first thread owns all 
resources. You should be able to hand them out in a controlled manner. For 
instance, you may want to spawn a thread for each socket connection and have 
that thread deal with all the communication with the socket.

Now I wonder about the feasability of the first step. How much global state 
would have to be wrapped in per-tread objects and how hard would that be? What 
other obstacles would there be to doing this change? I guess there is a 
complication with requesting memory from the kernel and returning memory, but 
I think that could be solved in more or less elegant ways.

Jacob Hallén

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

Reply via email to