On Sat, Mar 23, 2013 at 1:07 PM, Marko Tasic <mtasi...@gmail.com> wrote:
> Hi,
>
> I have had an idea for some time to re-implement how GIL works, but
> nothing radical. On purpose I'm saying that I don't want to "remove"
> GIL, but to modify it, so people who follow your list don't get too
> excited.
>
> Let me describe what I want to accomplish, and what kind of results am
> looking for, and also hear any possible design or implementation issue
> I/we may face.
>
> I want threads to work in parallel as they already do, with very
> little waiting times on other threads in case of GIL'ing. Actually, I
> want to use time reserved for GIL to do something else.
>
> To be clear, I don't need JIT, and lets take it out of discussion for
> some time later.
>
> First step:
> -------------
> My solution require wait-free lock-free queue, so many threads can
> only append/push to it atomically. Every thread created, including
> main thread, has to have (at least one) such a queue, and lets say
> queue has pointer to thread where it belongs.
>
> Not only queue is required, but also conditional variable (such as
> pthread_cond_t) per thread.
>
> What I imagine is high-level thread object that consists of low-level
> thread, low-level wait-free lock-free queue, and low-level conditional
> variable. Imagine that this high-level thread object is local per each
> OS thread created. From now on I will call high-level thread object
> just "thread". Every mutable object created has to have pointer to a
> thread in which its created. As a result, this way we can know origin
> of a mutable object by thread by inspecting mutable object itself, so
> as a consequence also know low-level thread, low-level queue and
> low-level condition variable associated by it.

You should look more at rust model where "mutable objects" need token
to read/write. In python it's an utter mess, because all classes are
mutable, so you need locking for object instantiation (or some sort of
token-passing) or something. This is a major nightmare.

>
> Having this special field is naive approach, so of course we can have
> allocation areas per thread, and GC can be smarter, but lets just
> forget about it for now.
>
> Second step:
> -----------------
> Every time "ceval" equivalent function observes a mutable object on
> bytecode interpretation, it checks if local thread is same as object's
> thread.
>
> If they are same, then proceed with interpretation of local ceval operation.
>
> If they are different, then:
>     - lets call local thread just T
>     - lets call operation just O
>     - take low-level queue of object's thread, lets call it Q
>     - group O and T into item and push it to Q
>     - set T's condition variable to wait
>
> Third step:
> -------------
> Lets imagine that we have old style GIL that counts opcodes executed,
> and after N opcodes instead of acquiring GIL, it processes queue. This
> processing is done by popping out operation (O) and thread (T) items
> from local thread's queue, and serially executing operations. After
> each executed operation it sends signal to condition variable passed
> inside thread which was passed in same queue item where is operation.
> Once, signal is received by awaiting thread, until now awaiting thread
> proceeds with execution, while thread that was processing queue also
> continues executing queued operations.
>
> Conclusion:
> ---------------
> There is ceval per thread. All ceval's run at same time. In the best
> case, they don't wait on each other because threads do not touch
> objects created in other threads. In the worst case, there is always
> only one thread executing operation, and all other are waiting it to
> finish, and this is close to how GIL already works (almost).
>
> Special care has to be taken for threads that are going to finish, and
> what happens with their objects if they are referenced by other
> objects that belong to other threads.

For what is worth this is *hard* - you need to write a new GC for starter.

>
> So I have also some questions:
> - How flexible is to modify GIL in pypy?

very, it's a translation-transformation.

> - Where should I look?

someone who calls objectmodel.invoke_around_extcall, from module/thread

> - Do I need to fully translate pypy in order to test it?

no, you can experiment on smaller languages (we do it all the time for
JIT, GC etc.)

> - I need wait-free look-free queue, so what kind of low-level atomic
> primitives do RPython has to offer?

none of what you want. you need to implement it.

> - I need to have special field for all mutable objects, so how to do so?

you can do this by judging in rtyper "are all the fields immutable".

>
> I do not expect any performance boost, just correctness of approach.
> Perhaps, I'm wrong.
>
> What do you think, is this practical approach?

it requires lots of work, but can in theory work. The token-passing
for mutable objects puts a heavy burden on say allocations or global
lookups. I don't think you would end up with a better performance than
STM, although feel free to try.

Cheers,
fijal
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to