Re: [pypy-dev] problem after merging of jit-virtual_state

2011-03-14 Thread Antonio Cuni
Hi Hakan,
thank you for the deep explanation. Now I understand what's going on :-)

So, I changed test_pypy_c_new to add a sys.setcheckinterval(some-huge-number),
so that the bridge from the signal/thread counter is never created and we can
forget about it.

Now, if I understand correctly, the two remaining loops are one for the case
"i non virtual" and the other for the case "i virtual", although both lead to
the same operations. I think this is the expected behavior in this case, so
are you ok if I just fix test_f1 to expect two loops?

ciao,
Anto

On 13/03/11 11:12, Hakan Ardo wrote:
> Hi,
> this is what happens here:
> 
> 1. The inner loop is traced and Loop0 is produced with preamble Loop1
> 
> 2. A bridge from Guard3 (the test in the while) back to Loop0 is
> traced (i.e the remaining parts of the outer loop)
> 
> 3. At the end of this bridge the VirtualState does not match the
> VirtualState of Loop0, so the loop is retraced
> 
> 4. The VirtualState of the newly traced version of the loop does not
> match the VirtualState at the end of the bridge so the bridge has to
> jump to the preamble instead of jumping to the new specialized version
> of the loop.
> 
> 5. A bridge from Guard6 (signal/thread counter) is traced and the same
> thing happens for this bridge.
> 
> This means that the additional two versions of the loop will never be
> used and should hopefully be reomved by the gc...
> 
> So there are two issues:
> 
> A. The additional specialized versions created does not become usable.
> This is the issue I'm working on in the jit-usable_retrace branch. The
> idea there is to have the retrace inherit the OptValue's of the
> jumpargs at the end of the bridge. This will become a fairly large
> change functionality wise...
> 
> B. The VirtualStates' s differs in the first place forcing a retrace.
> This is probably fixable by introducing some more cases in
> NotVirtualInfo._generate_guards(). The jit-usable_retrace branch
> contains more cases than trunk, don't know if those are enough for
> this test though...
> 
> Note however that
> jit/metainterp/test/test_nested_loops_discovered_by_bridge in
> test_loop_unroll.py, which conatins the same loop for a simple
> interpreter, does work nicely, wihtout the issues above.
> 
> On Sat, Mar 12, 2011 at 10:59 PM, Hakan Ardo  wrote:
>> On Sat, Mar 12, 2011 at 8:34 PM, Antonio Cuni  wrote:
>>> Hi Hakan,
>>>
>>> On 12/03/11 19:25, Hakan Ardo wrote:
 Yes, this is probably the VirtualState checking. It will retrace a
 loop whenever the VirtualState at the end of a bridge differs from the
 VirtualState at the beginning of the compiled trace (any of the
 compiled traces). This might indeed produce an identical trace if we
 are unlucky, but the idea is that this should only happen rarely.
>>>
>>> ok, that's clear. So, hopefully this particular example looks a bit bad, but
>>> in general it should not be an issue. It'd be nice to have a way to check 
>>> this
>>> thesis, but I agree that it's a bit hard.
>>
>> We should probably log the VirtualState together with the produced
>> loops and bridges. That would allow us to see how they differ when a
>> new version of a loop is traced. There are __repr__ methods I've been
>> using for that while debugging. They might need some rpythonizing to
>> translate though---
>>
>>>
 This is because the VirtualState  at the beginning of a trace is the
 state of all the OptValue of the inputargs produced during the
 optimization of the trace. This does not have to be the most general
 state for which the trace is usable (which would be hard to calculate
 I'm afraid).
>>>
>>> so, if I understand correctly, this is what happens:
>>>
>>> 1. we trace, optimize and compile loop A
>>>
>>> 2. after a while, we trace, optimize a compile a bridge B which then jumps
>>> back to A; by chance, the bridge looks the same as the loop
>>>
>>> Am I right?
>>
>> Maybe, I've not had the chance to look into any details yet. I'll do
>> that tomorrow...
>>
>>>
 A few cases that would (most likely) result in identical traces are
 salvaged in NotVirtualInfo._generate_guards by producing some extra
 gurads at the end of a bridge to make the VirtualState there match the
 VirtualState of a compiled trace. This is however only done if the
 guards would (most likely) not fail for the traced iteration.

 I'll look into what's happening in this particular test...
>>>
>>> I just did a quick check because I'm in a hurry, but from what I see we get
>>> three actual *loops*, not bridges.
>>
>> So if it's the same loop traced several times they should all have the
>> same preamble, and the preamble would have two bridges leading to the
>> two second versions of the loop. The preamble and it's two bridges
>> should end with different VirtualState's. The loops should  be
>> specialized to the different VirtualState's, but if the VirtualState's
>> are similar enough (but not equal) they might consist o

Re: [pypy-dev] problem after merging of jit-virtual_state

2011-03-14 Thread Hakan Ardo
On Mon, Mar 14, 2011 at 11:49 AM, Antonio Cuni  wrote:
> Hi Hakan,
> thank you for the deep explanation. Now I understand what's going on :-)
>
> So, I changed test_pypy_c_new to add a sys.setcheckinterval(some-huge-number),
> so that the bridge from the signal/thread counter is never created and we can
> forget about it.
>
> Now, if I understand correctly, the two remaining loops are one for the case
> "i non virtual" and the other for the case "i virtual", although both lead to
> the same operations. I think this is the expected behavior in this case, so
> are you ok if I just fix test_f1 to expect two loops?

Yes

>
> ciao,
> Anto
>
> On 13/03/11 11:12, Hakan Ardo wrote:
>> Hi,
>> this is what happens here:
>>
>> 1. The inner loop is traced and Loop0 is produced with preamble Loop1
>>
>> 2. A bridge from Guard3 (the test in the while) back to Loop0 is
>> traced (i.e the remaining parts of the outer loop)
>>
>> 3. At the end of this bridge the VirtualState does not match the
>> VirtualState of Loop0, so the loop is retraced
>>
>> 4. The VirtualState of the newly traced version of the loop does not
>> match the VirtualState at the end of the bridge so the bridge has to
>> jump to the preamble instead of jumping to the new specialized version
>> of the loop.
>>
>> 5. A bridge from Guard6 (signal/thread counter) is traced and the same
>> thing happens for this bridge.
>>
>> This means that the additional two versions of the loop will never be
>> used and should hopefully be reomved by the gc...
>>
>> So there are two issues:
>>
>> A. The additional specialized versions created does not become usable.
>> This is the issue I'm working on in the jit-usable_retrace branch. The
>> idea there is to have the retrace inherit the OptValue's of the
>> jumpargs at the end of the bridge. This will become a fairly large
>> change functionality wise...
>>
>> B. The VirtualStates' s differs in the first place forcing a retrace.
>> This is probably fixable by introducing some more cases in
>> NotVirtualInfo._generate_guards(). The jit-usable_retrace branch
>> contains more cases than trunk, don't know if those are enough for
>> this test though...
>>
>> Note however that
>> jit/metainterp/test/test_nested_loops_discovered_by_bridge in
>> test_loop_unroll.py, which conatins the same loop for a simple
>> interpreter, does work nicely, wihtout the issues above.
>>
>> On Sat, Mar 12, 2011 at 10:59 PM, Hakan Ardo  wrote:
>>> On Sat, Mar 12, 2011 at 8:34 PM, Antonio Cuni  wrote:
 Hi Hakan,

 On 12/03/11 19:25, Hakan Ardo wrote:
> Yes, this is probably the VirtualState checking. It will retrace a
> loop whenever the VirtualState at the end of a bridge differs from the
> VirtualState at the beginning of the compiled trace (any of the
> compiled traces). This might indeed produce an identical trace if we
> are unlucky, but the idea is that this should only happen rarely.

 ok, that's clear. So, hopefully this particular example looks a bit bad, 
 but
 in general it should not be an issue. It'd be nice to have a way to check 
 this
 thesis, but I agree that it's a bit hard.
>>>
>>> We should probably log the VirtualState together with the produced
>>> loops and bridges. That would allow us to see how they differ when a
>>> new version of a loop is traced. There are __repr__ methods I've been
>>> using for that while debugging. They might need some rpythonizing to
>>> translate though---
>>>

> This is because the VirtualState  at the beginning of a trace is the
> state of all the OptValue of the inputargs produced during the
> optimization of the trace. This does not have to be the most general
> state for which the trace is usable (which would be hard to calculate
> I'm afraid).

 so, if I understand correctly, this is what happens:

 1. we trace, optimize and compile loop A

 2. after a while, we trace, optimize a compile a bridge B which then jumps
 back to A; by chance, the bridge looks the same as the loop

 Am I right?
>>>
>>> Maybe, I've not had the chance to look into any details yet. I'll do
>>> that tomorrow...
>>>

> A few cases that would (most likely) result in identical traces are
> salvaged in NotVirtualInfo._generate_guards by producing some extra
> gurads at the end of a bridge to make the VirtualState there match the
> VirtualState of a compiled trace. This is however only done if the
> guards would (most likely) not fail for the traced iteration.
>
> I'll look into what's happening in this particular test...

 I just did a quick check because I'm in a hurry, but from what I see we get
 three actual *loops*, not bridges.
>>>
>>> So if it's the same loop traced several times they should all have the
>>> same preamble, and the preamble would have two bridges leading to the
>>> two second versions of the loop. The preamble and it's two bridges
>>> should end wi

[pypy-dev] Thinking about the GIL

2011-03-14 Thread Laura Creighton

Robert Hancock hosted a bof at pycon about concurrency and multiprocessing.
I went there looking to find out how other people were doing things,
especially looking for information about how other languages handled
things.  It would be nice to kill the GIL, if only we knew of a
brilliant way to do this.

Unfortunately, I was one year to late for this discussion.  This is
what Robert Hancock David Beazley, Peter Portante and others discussed at Pycon
_last year_.  So I asked Robert Hancock for the notes he took then.
(I continue after this  forwarded message)

--- Forwarded Message

Return-Path: hancock.rob...@gmail.com
Delivery-Date: Mon Mar 14 17:09:51 2011
Return-Path: 
Subject: Re: please send me the notes you took last year
To: Laura Creighton 

These are the books that I mentioned:
Machine Learning: An Algorithmic Perspective
http://www.amazon.com/gp/product/1420067184
 
I found this more approachable than the Bishop  and a number of 
examples are in Python.

Introduction to Data Mining
http://www.amazon.com/gp/product/0321321367

I've only started this, but it is nice with David Mease's Google Tech Talk 
series. http://www.youtube.com/watch?v=zRsMEl6PHhM

1.  Make all IO non-blocking and mediate the processes like greenlets.  This
does not allow you take advantage of the OS level thread scheduler which is
far more sophisticated than greenlets.  See the Linux kernel specifications
for the details of the multi-level feedback queue.

2. Construct a multi-level feedback queue within Python.  This is
extraordinarily complicated and complex to implement.  Why duplicate what
already exists?

3. Do we need to maintain compatibility with being able to call out to C
functions?  The primary complaint about the GIL is that it does not
efficiently handle CPU bound processes and multi-cores.  Running sequential 
processes in threads on multi-cores can actually slow down the processes.

4. Who has already solved this problem as part of the language?
-   Erlang (No one knew the nitty gritty details.)

-   Go - based on Tony Hoare's CSP and the work done on Plan 9 at Bell
Labs.  Uses the system scheduler and creates its own mini-threads (4k).
Need to investiagate the source code on line.  Goroutines do not have OS thread
affinity; they can multiplex over multiple threads.

-   Java - Early on Java used several versions of Greenlets, but now
uses system threads.  The JVM punts to the OS.

Conclusions
--
1.  Do not reinvent the wheel!  Many people have worked decades on this
problem.  Leverage thier expertise.

2.  Coroutines are frequently better than threads, but do scale and each
coroutine must me restarted in the thread where it was spawned.  See
greenlet.c.  Greenlets are also chained and have mutual dependencies.  
The order of execution is arbitrary with not method for priorities.

3.  Investigate if there is an alternative to the current method of calling
external C objects.

4. Dave did a POC on priorities:
http://dabeaz.blogspot.com/2010/02/revisiting-thread-priorities-and-new.html

5.  Everyone agreed that some type of priority mechanism is a good idea, but
wanted to see what Unladen Swallow does. (As of March 2011 Google is no
longer actively developing this project.)


References
- ---
Dave Beazley - GIL Wars
Dave Beazley - Yieldable Threads http://www.dabeaz.com/blog.html
Linux Kernel http://goo.gl/RkxVs
Erlang
Go - golang.org
CSP - Tony Hoare http://www.usingcsp.com/cspbook.pdf

I spoke with Peter Portante yesterday, and he would be very interested in
participating even though he has very little free time.  Peter works at HP
and worked on their OS threading model.  Also, see his Pycon 2010 talk on
non-blocking IO and the 2011 talk on co-routines.

Let me know if you have any questions.

Bob Hancock

Blog - www.bobhancock.org
Twitter - bob_hancock and nycgtug
--- End of Forwarded Message

And, indeed, Peter Portante is very interested in thinking about doing
without the GIL.  He's already sent me this:

Date:Sun, 13 Mar 2011 16:42:00 -0400
To:  Laura Creighton 
From:Peter Portante 
Subject: Re: [pypy-dev] possibly of use for our documentation

Return-Path: peter.a.porta...@gmail.com
Delivery-Date: Sun Mar 13 21:42:21 2011
Return-Path: 
Hi Laura,

Just left pycon and heard about talks of pypy removing the gil.

I work on tru64 unix's thread library for 8 years. If there is any thing I 
can do to help with this effort, please let me know.

Thanks,

-peter
-

Note: I have never promised anybody anything.  This was a 'please
educate me appeal'.  But Bob Hancock is coming back this afternoon
to talk with us.

Anybody got any questions they want to make sure I ask him?

Laura
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Thinking about the GIL

2011-03-14 Thread Timothy Baldridge
I guess I'm missing something, but what's wrong with simply ripping
out the GIL? In C# we have threads, C FFI (via PInvoke), and never
have any major issues with threading. It seems to me that the GIL is
only needed because assumptions were made when writing the
interpreter. I don't know if it's full of global variables or
something, but can anyone explain why a GIL is needed at all? I've
done quite a bit of multi-threading programming, and I fail to see the
need for a GIL.

>Why duplicate what
> already exists?

Timothy
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Thinking about the GIL

2011-03-14 Thread Benjamin Peterson
2011/3/14 Timothy Baldridge :
> I guess I'm missing something, but what's wrong with simply ripping
> out the GIL? In C# we have threads, C FFI (via PInvoke), and never
> have any major issues with threading. It seems to me that the GIL is
> only needed because assumptions were made when writing the
> interpreter. I don't know if it's full of global variables or
> something, but can anyone explain why a GIL is needed at all? I've
> done quite a bit of multi-threading programming, and I fail to see the
> need for a GIL.

Many reasons. The biggest being python data structures aren't thread-safe.



-- 
Regards,
Benjamin
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Thinking about the GIL

2011-03-14 Thread Amaury Forgeot d'Arc
Hi,

2011/3/14 Timothy Baldridge :
> I guess I'm missing something, but what's wrong with simply ripping
> out the GIL? In C# we have threads, C FFI (via PInvoke), and never
> have any major issues with threading. It seems to me that the GIL is
> only needed because assumptions were made when writing the
> interpreter. I don't know if it's full of global variables or
> something, but can anyone explain why a GIL is needed at all? I've
> done quite a bit of multi-threading programming, and I fail to see the
> need for a GIL.

The GIL greatly simplifies multi-threaded programming in Python.
For example, you are guaranteed that someList.append(x) won't run into
some race condition and crash the program.

In CPython, simply incrementing the reference count of an object needs some
synchronization between threads. And in PyPy, garbage collectors are
not (yet) thread-safe.

See better explanations about the GIL in CPython:
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock
http://effbot.org/pyfaq/can-t-we-get-rid-of-the-global-interpreter-lock.htm

-- 
Amaury Forgeot d'Arc
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev


Re: [pypy-dev] Thinking about the GIL

2011-03-14 Thread Benjamin Peterson
2011/3/14 Timothy Baldridge :
> They may not be thread-safe, but as far as a program goes, do we
> really care? If I have two threads adding items to the same list,
> should I really be expecting the interpreter to keep things straight?
> What's wrong with forcing the user to lock structures before editing
> them? This is something that Java, and C#, and C++ all require.

Yes, the Python interpreter should never crash because of user mistakes.



-- 
Regards,
Benjamin
___
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev