On Sat, Mar 23, 2013 at 1:07 PM, Marko Tasic <mtasi...@gmail.com> wrote: > Hi, > > I have had an idea for some time to re-implement how GIL works, but > nothing radical. On purpose I'm saying that I don't want to "remove" > GIL, but to modify it, so people who follow your list don't get too > excited. > > Let me describe what I want to accomplish, and what kind of results am > looking for, and also hear any possible design or implementation issue > I/we may face. > > I want threads to work in parallel as they already do, with very > little waiting times on other threads in case of GIL'ing. Actually, I > want to use time reserved for GIL to do something else. > > To be clear, I don't need JIT, and lets take it out of discussion for > some time later. > > First step: > ------------- > My solution require wait-free lock-free queue, so many threads can > only append/push to it atomically. Every thread created, including > main thread, has to have (at least one) such a queue, and lets say > queue has pointer to thread where it belongs. > > Not only queue is required, but also conditional variable (such as > pthread_cond_t) per thread. > > What I imagine is high-level thread object that consists of low-level > thread, low-level wait-free lock-free queue, and low-level conditional > variable. Imagine that this high-level thread object is local per each > OS thread created. From now on I will call high-level thread object > just "thread". Every mutable object created has to have pointer to a > thread in which its created. As a result, this way we can know origin > of a mutable object by thread by inspecting mutable object itself, so > as a consequence also know low-level thread, low-level queue and > low-level condition variable associated by it.
You should look more at rust model where "mutable objects" need token to read/write. In python it's an utter mess, because all classes are mutable, so you need locking for object instantiation (or some sort of token-passing) or something. This is a major nightmare. > > Having this special field is naive approach, so of course we can have > allocation areas per thread, and GC can be smarter, but lets just > forget about it for now. > > Second step: > ----------------- > Every time "ceval" equivalent function observes a mutable object on > bytecode interpretation, it checks if local thread is same as object's > thread. > > If they are same, then proceed with interpretation of local ceval operation. > > If they are different, then: > - lets call local thread just T > - lets call operation just O > - take low-level queue of object's thread, lets call it Q > - group O and T into item and push it to Q > - set T's condition variable to wait > > Third step: > ------------- > Lets imagine that we have old style GIL that counts opcodes executed, > and after N opcodes instead of acquiring GIL, it processes queue. This > processing is done by popping out operation (O) and thread (T) items > from local thread's queue, and serially executing operations. After > each executed operation it sends signal to condition variable passed > inside thread which was passed in same queue item where is operation. > Once, signal is received by awaiting thread, until now awaiting thread > proceeds with execution, while thread that was processing queue also > continues executing queued operations. > > Conclusion: > --------------- > There is ceval per thread. All ceval's run at same time. In the best > case, they don't wait on each other because threads do not touch > objects created in other threads. In the worst case, there is always > only one thread executing operation, and all other are waiting it to > finish, and this is close to how GIL already works (almost). > > Special care has to be taken for threads that are going to finish, and > what happens with their objects if they are referenced by other > objects that belong to other threads. For what is worth this is *hard* - you need to write a new GC for starter. > > So I have also some questions: > - How flexible is to modify GIL in pypy? very, it's a translation-transformation. > - Where should I look? someone who calls objectmodel.invoke_around_extcall, from module/thread > - Do I need to fully translate pypy in order to test it? no, you can experiment on smaller languages (we do it all the time for JIT, GC etc.) > - I need wait-free look-free queue, so what kind of low-level atomic > primitives do RPython has to offer? none of what you want. you need to implement it. > - I need to have special field for all mutable objects, so how to do so? you can do this by judging in rtyper "are all the fields immutable". > > I do not expect any performance boost, just correctness of approach. > Perhaps, I'm wrong. > > What do you think, is this practical approach? it requires lots of work, but can in theory work. The token-passing for mutable objects puts a heavy burden on say allocations or global lookups. I don't think you would end up with a better performance than STM, although feel free to try. Cheers, fijal _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev