Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Nir Aides
On Mon, Aug 29, 2011 at 8:42 PM, Jesse Noller  wrote:
> On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou  wrote:
>>
>> That sanitization is generally useful, though. For example if you want
>> to use any I/O after a fork().
>
> Oh! I don't disagree; I'm just against the removal of the ability to
> mix multiprocessing and threads; which it does internally and others
> do in every day code.

I am not familiar with the python-dev definition for deprecation, but
when I used the word in the bug discussion I meant to advertize to
users that they should not mix threading and forking since that mix is
and will remain broken by design; I did not mean removal or crippling
of functionality.

“When I use a word,” Humpty Dumpty said, in rather a scornful tone,
“it means just what I choose it to mean—neither more nor less.” -
Through the Looking-Glass

(btw, my tone is not scornful)

And there is no way around it - the mix in general is broken, with an
atfork mechanism or without it.
People can choose to keep doing it in their every day code at their
own risk, be it significantly high or insignificantly low.
But the documentation should explain the problem clearly.

As for the internal use of threads in the multiprocessing module I
proposed a potential way to "sanitize" those particular worker
threads:
http://bugs.python.org/issue6721#msg140402

If it makes sense and entails changes to internal multiprocessing
worker threads, those changes could be applied as bug fixes to Python
2.x and previous Python 3.x releases.

This does not contradict adding now the feature to spawn, and to make
it the only possibility in the future. I agree that this is the
"saner" approach but it is a new feature not a bug fix.

Nir
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-29 Thread Nir Aides
On Mon, Aug 29, 2011 at 8:16 PM, Antoine Pitrou  wrote:
>
> On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller  wrote:
> >
> > Yes; but spawning and forking are both slow to begin with - it's
> > documented (I hope heavily enough) that you should spawn
> > multiprocessing children early, and keep them around instead of
> > constantly creating/destroying them.
>
> I think fork() is quite fast on modern systems (e.g. Linux). exec() is
> certainly slow, though.

On my system, the time it takes worker code to start is:

40 usec with thread.start_new_thread
240 usec with threading.Thread().start
450 usec with os.fork
1 ms with multiprocessing.Process.start
25 ms with subprocess.Popen to start a trivial script.

so os.fork has similar latency to threading.Thread().start, while
spawning is 100 times slower.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-26 Thread Nir Aides
Another face of the discussion is about whether to deprecate the mixing of
the threading and processing modules and what to do about the
multiprocessing module which is implemented with worker threads.



On Tue, Aug 23, 2011 at 11:29 PM, Antoine Pitrou wrote:

> Le mardi 23 août 2011 à 22:07 +0200, Charles-François Natali a écrit :
> > 2011/8/23 Antoine Pitrou :
> > > Well, I would consider the I/O locks the most glaring problem. Right
> > > now, your program can freeze if you happen to do a fork() while e.g.
> > > the stderr lock is taken by another thread (which is quite common when
> > > debugging).
> >
> > Indeed.
> > To solve this, a similar mechanism could be used: after fork(), in the
> > child process:
> > - just reset each I/O lock (destroy/re-create the lock) if we can
> > guarantee that the file object is in a consistent state (i.e. that all
> > the invariants hold). That's the approach I used in my initial patch.
>
> For I/O locks I think that would work.
> There could also be a process-wide "fork lock" to serialize locks and
> other operations, if we want 100% guaranteed consistency of I/O objects
> across forks.
>
> > - call a fileobject method which resets the I/O lock and sets the file
> > object to a consistent state (in other word, an atfork handler)
>
> I fear that the complication with atfork handlers is that you have to
> manage their lifecycle as well (i.e., when an IO object is destroyed,
> you have to unregister the handler).
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] issue 6721 "Locks in python standard library should be sanitized on fork"

2011-08-23 Thread Nir Aides
Hi all,

Please consider this invitation to stick your head into an interesting
problem:
http://bugs.python.org/issue6721

Nir
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Nir Aides
I take "...running off with the good stuff and selling it for profit" to
mean "creating derivative work and commercializing it as proprietary code"
which you can not do with GPL licensed code. Also, while the GPL does not
prevent selling copies for profit it does not make it very practical either.


On Tue, Jul 6, 2010 at 9:44 AM, Ben Finney

> wrote:

> Guido van Rossum  writes:
>
> > A secondary reasoning for some open source licenses might be to
> > prevent others from running off with the good stuff and selling it for
> > profit. The GPL is big on that […]
>
> Really, it's not. Please stop spreading this canard.
>
> The GPL explicitly and deliberately grants the freedom to sell the work
> for profit. Every copyright holder who grants license under the terms of
> the GPL is explicitly saying “you can seel this software for any price
> you like” http://www.gnu.org/philosophy/selling.html>.
>
> Whatever other complaints people may have against the GPL, it's simply
> *false* to claim what Guido did above. Please stop it.
>
> --
>  \“We cannot solve our problems with the same thinking we used |
>  `\   when we created them.” —Albert Einstein |
> _o__)  |
> Ben Finney
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the GIL (with a BFS scheduler)

2010-05-17 Thread Nir Aides
I would like to restart this thread with 2 notes from the lively discussion:

a) Issue 7946 (and this discussion?) concerns Python 3.2
b) The GIL problems are not specific to OSX. The old and new GIL misbehave
on GNU/Linux and Windows too.

[Putting on anti-frying-pan helmet]

Nir
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fixing the GIL (with a BFS scheduler)

2010-05-16 Thread Nir Aides
Hi all,

Here is a second (last?) attempt at getting traction on fixing the GIL (is
it broken?) with BFS (WTF?).
So don't be shy (don't be too rude either) since ignoring counts as down
voting.

Relevant Python issue: http://bugs.python.org/issue7946


*Bottom line first*

I submitted an implementation of BFS (
http://ck.kolivas.org/patches/bfs/sched-BFS.txt) as a patch to the GIL,
which to the extent I have tested it, behaves nicely on Windows XP, Windows
7, GNU/Linux with either CFS or O(1) schedulers, 1/2/4 cores, laptop,
desktop and VirtualBox VM guest (some data below).

The patch is still work in progress and requires work in terms of style,
moving code where it belongs, test code, etc... nevertheless, Python core
developers recommended I already (re)post to python-dev for discussion.


*So is the GIL broken?*

There seems to be some disagreement on that question among Python core
developers (unless you all agree it is not broken :) ). Some developers
maintain the effects described by David Beazley do not affect real world
systems. Even I took the role of a devil's advocate in a previous
discussion, but in fact I think that Python, being a general purpose
language, is similar to the OS in that regard. It is used across many
application domains, platforms, and development paradigms, just as OS
schedulers
are, and therefore accepting thread scheduling with such properties as a
fact of life is not a good idea.

I was first bitten by the original GIL last year while testing a system, and
found David's research while looking for answers, and later had to work
around that problem in another system. Here are other real world cases:

1) Zope people hit this back in 2002 and documented the problem with
interesting insight:
http://www.zope.org/Members/glpb/solaris/multiproc
"I have directly observed a 30% penalty under MP constraints when the
sys.setcheckinterval value was too low (and there was too much GIL
thrashing)."

http://www.zope.org/Members/glpb/solaris/report_ps
"A machine that's going full-throttle isn't as bad, curiously enough --
because the other CPU's are busy doing real work, the GIL doesn't have as
much opportunity to get shuffled between CPUs.  On a MP box it's very
important to set sys.setcheckinterval() up to a fairly large number, I
recommend pystones / 50 or so."

2) Python mailing list - 2005
http://mail.python.org/pipermail/python-list/2005-August/336286.html
"The app suffers from serious performance degradation (compared to
pure c/C++) and high context switches that I suspect the GIL unlocking
may be aggravating ?"

3) Python mailing list - 2008
http://mail.python.org/pipermail/python-list/2008-June/1143217.html
"When I start the server, it sometimes eats up 100% of the CPU for a good
minute or so... though none of the  threads are CPU-intensive"

4) Twisted
http://twistedmatrix.com/pipermail/twisted-python/2005-July/011048.html
"When I run a CPU intensive method via threads.deferToThread it takes all
the CPU away and renders the twisted process unresponsive."

Admittedly, it is not easy to dig reports up in Google.

Finally, I think David explained the relevance of this problem quite nicely:
http://mail.python.org/pipermail/python-dev/2010-March/098416.html


*What about the new GIL?*

There is no real world experience with the new GIL since it is under
development. What we have is David's analysis and a few benchmarks from the
bug report.


*Evolving the GIL into a scheduler*

The problem addressed by the GIL has always been *scheduling* threads to the
interpreter, not just controlling access to it. The patches by Antoine and
David essentially evolve the GIL into a scheduler, however both cause thread
starvation or high rate of context switching in some scenarios (see data
below).


*BFS*

Enter BFS, a new scheduler designed by Con Kolivas, a Linux kernel hacker
who is an expert in this field:
http://ck.kolivas.org/patches/bfs/sched-BFS.txt
"The goal of the Brain Fuck Scheduler, referred to as BFS from here on, is
to completely do away with the complex designs of the past for the cpu
process scheduler and instead implement one that is very simple in basic
design. The main focus of BFS is to achieve excellent desktop interactivity
and responsiveness without heuristics and tuning knobs that are difficult
to understand, impossible to model and predict the effect of, and when tuned
to one workload cause massive detriment to another."

I submitted an implementation of BFS (bfs.patch) which on my machines gives
comparable performance to gilinter2.patch (Antoine's) and seems to schedule
threads more fairly, predictably, and with lower rate of context switching
(see data below).

There are however, some issues in bfs.patch:

1) It works on top of the OS scheduler, which means (for all GIL patches!):
a) It does not control and is not aware of information such as OS thread
preemption, CPU core to run on, etc...
b) There may be hard to predict interaction between BFS and the
underlying 

Re: [Python-Dev] "Fixing" the new GIL

2010-04-12 Thread Nir Aides
There is no need for the "forced" switch to be guaranteed. The motivation
for the wait is to increase throughput and decrease latency when OS
schedules next thread to run on same core as yielding thread but yielding
thread is about to continue running into CPU intensive library code. An
occasional spurious wakeup resulting in missed switch will not affect
that. Note
similar logic in new GIL.

I will add the loop to keep the code clean.

Thanks,
Nir



On Mon, Apr 12, 2010 at 10:49 PM, Peter Portante  wrote:

>  That code will not switch to another thread if the condition variable
> either does not actually block the thread on the call (which is allowed by
> the standard to give implementations some flexibility for making it work
> correctly – read the standard reasoning for more information), or the thread
> is woken up without a predicate change (also allowed by the standard for
> similar reasons). Both of those cases are called “spurious wake-ups”.
>
> You may not be able to readily get your implementation to behavior that
> way, but in the wild, we need to account for this behavior because Cpython
> will be run on systems where it will happen. :)
>
> -peter
>
>
>
> On 4/12/10 3:36 PM, "Nir Aides"  wrote:
>
> Please describe clearly a step by step scenario in which that code will
> fail.
>
>
> On Mon, Apr 12, 2010 at 10:25 PM, Peter Portante <
> peter.a.porta...@gmail.com> wrote:
>
> Yes, but unless you loop until the predicate is False, no forced switch is
> guaranteed to occur. You might as well remove the code. If you want to keep
> the code as is, call me when you need a life guard to help debug mystifying
> behaviors. ;) -peter
>
>
>
> On 4/12/10 3:17 PM, "Nir Aides" http://n...@winpdb.org> >
> wrote:
>
> The loop-less wait is similar to the one in new GIL. It is used to force a
> switch to next thread in particular scenario and the motivation is explained
> in comment to another if clause a few lines up. Those two if clauses can be
> joined though.
>
>
> On Mon, Apr 12, 2010 at 3:37 PM, Peter Portante <
> peter.a.porta...@gmail.com <http://peter.a.porta...@gmail.com> > wrote:
>
> Hmm, so I see in
> bfs_yield():
>
> +if (tstate != NULL && bfs_thread_switch == tstate) {
> +COND_RESET(tstate->bfs_cond);
> +COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +}
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +COND_RESET(tstate->bfs_cond);
> +while (bfs_thread != tstate) {
> +_bfs_timed_wait(tstate, timestamp);
> +timestamp = get_timestamp();
> +}
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
>
> On 4/12/10 8:12 AM, "Nir Aides" http://n...@winpdb.org>  <
> http://n...@winpdb.org> > wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.porta...@gmail.com <http://peter.a.porta...@gmail.com>  <
> http://peter.a.porta...@gmail.com> > wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it q

Re: [Python-Dev] "Fixing" the new GIL

2010-04-12 Thread Nir Aides
Please describe clearly a step by step scenario in which that code will
fail.


On Mon, Apr 12, 2010 at 10:25 PM, Peter Portante  wrote:

>  Yes, but unless you loop until the predicate is False, no forced switch
> is guaranteed to occur. You might as well remove the code. If you want to
> keep the code as is, call me when you need a life guard to help debug
> mystifying behaviors. ;) -peter
>
>
>
> On 4/12/10 3:17 PM, "Nir Aides"  wrote:
>
> The loop-less wait is similar to the one in new GIL. It is used to force a
> switch to next thread in particular scenario and the motivation is explained
> in comment to another if clause a few lines up. Those two if clauses can be
> joined though.
>
>
> On Mon, Apr 12, 2010 at 3:37 PM, Peter Portante <
> peter.a.porta...@gmail.com> wrote:
>
> Hmm, so I see in
> bfs_yield():
>
> +if (tstate != NULL && bfs_thread_switch == tstate) {
> +COND_RESET(tstate->bfs_cond);
> +COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +}
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +COND_RESET(tstate->bfs_cond);
> +while (bfs_thread != tstate) {
> +_bfs_timed_wait(tstate, timestamp);
> +timestamp = get_timestamp();
> +}
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
>
> On 4/12/10 8:12 AM, "Nir Aides" http://n...@winpdb.org> >
> wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.porta...@gmail.com <http://peter.a.porta...@gmail.com> > wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html <
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html>
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" <
>  n...@winpdb.org <http://n...@winpdb.org> <http://n...@winpdb.org> > wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt <
> http://ck.kolivas.org/patches/bfs/sched-BFS.txt>
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946 <
> http://bugs.python.org/issue7946> . It is work in progress but is ready
> for some opinion.
>
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just control

Re: [Python-Dev] "Fixing" the new GIL

2010-04-12 Thread Nir Aides
At some point there was a loop, later it remained since I feel it is more
readable than a bunch of nested if-else clauses.
Should probably replace with do {...} while(0)

I was conditioned with electrical shocks in the dungeons of a corporate to
always use for loops.

I uploaded the patch to Rietveld for code review comments:
http://codereview.appspot.com/857049

Nir


On Mon, Apr 12, 2010 at 3:48 PM, Peter Portante
wrote:

>  And why the for(;;) loop in bfs_schedule()? I don’t see a code path that
> would loop there. Perhaps I am missing it ...
>
> -peter
>
>
>
> On 4/12/10 8:37 AM, "Peter Portante"  wrote:
>
> Hmm, so I see in bfs_yield():
>
> +if (tstate != NULL && bfs_thread_switch == tstate) {
> +COND_RESET(tstate->bfs_cond);
> +COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +}
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +COND_RESET(tstate->bfs_cond);
> +while (bfs_thread != tstate) {
> +_bfs_timed_wait(tstate, timestamp);
> +timestamp = get_timestamp();
> +}
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
> On 4/12/10 8:12 AM, "Nir Aides"  wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.porta...@gmail.com> wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html <
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html>
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" < n...@winpdb.org <http://n...@winpdb.org> >
> wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt <
> http://ck.kolivas.org/patches/bfs/sched-BFS.txt>
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946 <
> http://bugs.python.org/issue7946> . It is work in progress but is ready
> for some opinion.
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just controlling access to it, and therefore the GIL, a
> lock implemented as a simple semaphore was the wrong solution.
>
> The patches by Antoine and David essentially evolve the GIL into a
> scheduler, however both cause thread starvation or high rate of context
> switching in some scenarios:
>
> With Floren't 

Re: [Python-Dev] "Fixing" the new GIL

2010-04-12 Thread Nir Aides
The loop-less wait is similar to the one in new GIL. It is used to force a
switch to next thread in particular scenario and the motivation is explained
in comment to another if clause a few lines up. Those two if clauses can be
joined though.


On Mon, Apr 12, 2010 at 3:37 PM, Peter Portante
wrote:

>  Hmm, so I see in bfs_yield():
>
> +if (tstate != NULL && bfs_thread_switch == tstate) {
> +COND_RESET(tstate->bfs_cond);
> +COND_WAIT(tstate->bfs_cond, bfs_mutex);
> +}
>
> So, to me, the above code says, “Wait for the condition that tstate is
> either NULL, or bfs_thread_switch does not equal tstate”. So the predicate
> is: “(tstate != NULL && bfs_thread_switch == tstate)”.
>
> If the predicate is True before you call COND_WAIT() and True after you
> call COND_WAIT(), either you don’t need to call COND_WAIT() at all, or you
> need to loop until the predicate is False. There is no guarantee that a
> condition wait actually did anything at all. Yes, there will be spurious
> wake ups, but you can’t tell if the thread actually blocked and then woke
> up, or never blocked at all. If it never actually blocks, then that code
> path is not helpful.
>
> On Windows, before this loop in bfs_schedule():
>
> +COND_RESET(tstate->bfs_cond);
> +while (bfs_thread != tstate) {
> +_bfs_timed_wait(tstate, timestamp);
> +timestamp = get_timestamp();
> +}
>
> You might want to avoid the call to reset the condition variable if the
> predicate is already False.
>
> -peter
>
>
>
> On 4/12/10 8:12 AM, "Nir Aides"  wrote:
>
> Hi Peter,
>
> There is no need for a loop in bfs_yield().
>
>
> On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante <
> peter.a.porta...@gmail.com> wrote:
>
> Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html<
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html>
>
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html <
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html>
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" <
> n...@winpdb.org <http://n...@winpdb.org> > wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt <
> http://ck.kolivas.org/patches/bfs/sched-BFS.txt>
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946 <
> http://bugs.python.org/issue7946> . It is work in progress but is ready
> for some opinion.
>
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just controlling access to it, and therefore the GIL, a
> lock implemented as a simple semaphore was the wrong solution.
>
> The patches by Antoine and David essentially evolve the GIL into a
> scheduler, however both cause thread starvation or high rate of context
> switching in some scenarios:
>
> With Floren't write test ( http://bugs.python.org/issue7946#msg101120 <
> http://bugs.python.org/issue7946#msg101120> ):
>
> 2 bg threads, 2 cores set to performance, karmic, PyCon patch, context
> switching shoots up to 200K/s.
> 2 bg threads, 1 core, set to on-demand, karmic, idle machine, gilinter
> patch starves one of the bg thre

Re: [Python-Dev] "Fixing" the new GIL

2010-04-12 Thread Nir Aides
Hi Peter,

There is no need for a loop in bfs_yield().


On Mon, Apr 12, 2010 at 4:26 AM, Peter Portante
wrote:

>  Nir,
>
> Per the POSIX standard, both pthread_cond_wait() and
> pthread_cond_timedwait() need to be performed in a loop.  See the fourth
> paragraph of the description from:
>
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html
>
>
> For the Windows side, I think you have a similar problem. Condition
> variables are signaling mechanisms, and so they have a separate boolean
> predicate associated with them. If you release the mutex that protects the
> predicate, then after you reacquire the mutex, you have to reevaluate the
> predicate to ensure that the condition has actually been met.
>
> You might want to look at the following for a discussion (not sure how good
> it is, as I just google’d it quickly) of how to implement POSIX semantics on
> Windows:
>
> http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
>
>
> Before you can evaluate the effectiveness of any of the proposed scheduling
> schemes, the fundamental uses of mutexes and condition variables, and their
> implementations, must be sound.
>
> -peter
>
>
>
> On 4/11/10 6:50 PM, "Nir Aides" < n...@winpdb.org> wrote:
>
> Hello all,
>
> I would like to kick this discussion back to life with a simplified
> implementation of the BFS scheduler, designed by the Linux kernel hacker Con
> Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt
>
> I submitted bfs.patch at  http://bugs.python.org/issue7946. It is work in
> progress but is ready for some opinion.
>
> On my machine BFS gives comparable performance to gilinter, and seems to
> schedule threads more fairly, predictably, and with lower rate of context
> switching. Its basic design is very simple but nevertheless it was designed
> by an expert in this field, two characteristics which combine to make it
> attractive to this case.
>
> The problem addressed by the GIL has always been *scheduling* threads to
> the interpreter, not just controlling access to it, and therefore the GIL, a
> lock implemented as a simple semaphore was the wrong solution.
>
> The patches by Antoine and David essentially evolve the GIL into a
> scheduler, however both cause thread starvation or high rate of context
> switching in some scenarios:
>
> With Floren't write test ( http://bugs.python.org/issue7946#msg101120):
> 2 bg threads, 2 cores set to performance, karmic, PyCon patch, context
> switching shoots up to 200K/s.
> 2 bg threads, 1 core, set to on-demand, karmic, idle machine, gilinter
> patch starves one of the bg threads.
> 4 bg threads, 4x1 core xeon, centos 5.3, gilinter patch, all bg threads
> starved, context switching shoots up to 250K/s.
>
> With UDP test ( http://bugs.python.org/file16316/udp-iotest.py), add
> zlib.compress(b'GIL') to the workload:
> both gilinter and PyCon patches starve the IO thread.
>
> The BFS patch currently involves more overhead by reading the time stamp on
> each yield and schedule operations. In addition it still remains to address
> some issues related to timestamps such as getting different time stamp
> readings on different cores on some (older) multi-core systems.
>
> Any thoughts?
>
> Nir
>
>
>
> On Sun, Mar 14, 2010 at 12:46 AM, Antoine Pitrou < solip...@pitrou.net>
> wrote:
>
>
> Hello,
>
> As some of you may know, Dave Beazley recently exhibited a situation
> where the new GIL shows quite a poor behaviour (the old GIL isn't very
> good either, but still a little better). This issue is followed in
> http://bugs.python.org/issue7946
>
> This situation is when an IO-bound thread wants to process a lot of
> incoming packets, while one (or several) CPU-bound thread is also
> running. Each time the IO-bound thread releases the GIL, the CPU-bound
> thread gets it and keeps holding it for at least 5 milliseconds
> (default setting), which limits the number of individual packets which
> can be recv()'ed and processed per second.
>
> I have proposed two mechanisms, based on the same idea: IO-bound
> threads should be able to steal the GIL very quickly, rather than
> having to wait for the whole "thread switching interval" (again, 5 ms
> by default). They differ in how they detect an "IO-bound threads":
>
> - the first mechanism is actually the same mechanism which was
>   embodied in the original new GIL patch before being removed. In this
>   approach, IO methods (such as socket.read() in socketmodule.c)
>   releasing the GIL must use a separate C macro when trying to get the
>   GIL back again.
>
>

Re: [Python-Dev] "Fixing" the new GIL

2010-04-11 Thread Nir Aides
Hello all,

I would like to kick this discussion back to life with a simplified
implementation of the BFS scheduler, designed by the Linux kernel hacker Con
Kolivas: http://ck.kolivas.org/patches/bfs/sched-BFS.txt

I submitted bfs.patch at http://bugs.python.org/issue7946. It is work in
progress but is ready for some opinion.

On my machine BFS gives comparable performance to gilinter, and seems to
schedule threads more fairly, predictably, and with lower rate of context
switching. Its basic design is very simple but nevertheless it was designed
by an expert in this field, two characteristics which combine to make it
attractive to this case.

The problem addressed by the GIL has always been *scheduling* threads to the
interpreter, not just controlling access to it, and therefore the GIL, a
lock implemented as a simple semaphore was the wrong solution.

The patches by Antoine and David essentially evolve the GIL into a
scheduler, however both cause thread starvation or high rate of context
switching in some scenarios:

With Floren't write test (http://bugs.python.org/issue7946#msg101120):
2 bg threads, 2 cores set to performance, karmic, PyCon patch, context
switching shoots up to 200K/s.
2 bg threads, 1 core, set to on-demand, karmic, idle machine, gilinter patch
starves one of the bg threads.
4 bg threads, 4x1 core xeon, centos 5.3, gilinter patch, all bg threads
starved, context switching shoots up to 250K/s.

With UDP test (http://bugs.python.org/file16316/udp-iotest.py), add
zlib.compress(b'GIL') to the workload:
both gilinter and PyCon patches starve the IO thread.

The BFS patch currently involves more overhead by reading the time stamp on
each yield and schedule operations. In addition it still remains to address
some issues related to timestamps such as getting different time stamp
readings on different cores on some (older) multi-core systems.

Any thoughts?

Nir



On Sun, Mar 14, 2010 at 12:46 AM, Antoine Pitrou wrote:

>
> Hello,
>
> As some of you may know, Dave Beazley recently exhibited a situation
> where the new GIL shows quite a poor behaviour (the old GIL isn't very
> good either, but still a little better). This issue is followed in
> http://bugs.python.org/issue7946
>
> This situation is when an IO-bound thread wants to process a lot of
> incoming packets, while one (or several) CPU-bound thread is also
> running. Each time the IO-bound thread releases the GIL, the CPU-bound
> thread gets it and keeps holding it for at least 5 milliseconds
> (default setting), which limits the number of individual packets which
> can be recv()'ed and processed per second.
>
> I have proposed two mechanisms, based on the same idea: IO-bound
> threads should be able to steal the GIL very quickly, rather than
> having to wait for the whole "thread switching interval" (again, 5 ms
> by default). They differ in how they detect an "IO-bound threads":
>
> - the first mechanism is actually the same mechanism which was
>  embodied in the original new GIL patch before being removed. In this
>  approach, IO methods (such as socket.read() in socketmodule.c)
>  releasing the GIL must use a separate C macro when trying to get the
>  GIL back again.
>
> - the second mechanism dynamically computes the "interactiveness" of a
>  thread and allows interactive threads to steal the GIL quickly. In
>  this approach, IO methods don't have to be modified at all.
>
> Both approaches show similar benchmark results (for the benchmarks
> that I know of) and basically fix the issue put forward by Dave Beazley.
>
> Any thoughts?
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] "Fixing" the new GIL

2010-03-16 Thread Nir Aides
Hi,

I posted a small patch to the GIL which demonstrates the scheduling policy
of pthreads conditions.

It seems that with pthreads a good policy is to allow (and help) the OS to
manage scheduling of threads via the condition queue without introducing
scheduling logic.
Some changes include removing the timeout from the new GIL wait, and
allowing CPU bound threads to run long enough to let the OS recognize them
as such.
The patch uses ticks countdown-to-switch but it should be changed to a time
based countdown (If taking a look at the clock is expensive maybe it can be
done once every N ticks during countdown).

Remains to explore what can be done on other platforms.

Nir



2010/3/16 Nir Aides 

> Hello Dave,
>
> The following documentation suggests ordering in Linux is not FIFO:
>
> http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html#tag_03_518_08_06
> "Threads waiting on mutexes and condition variables are selected to
> proceed in an order dependent upon the scheduling policy rather than in some
> fixed order (for example, FIFO or priority). Thus, the scheduling policy
> determines which thread(s) are awakened and allowed to proceed."
>
> Here is the code:
>
> http://www.google.com/codesearch/p?hl=en#5ge3gHPB4K4/gnu/glibc/glibc-linuxthreads-2.1.1.tar.gz%7CaeB7Uqo7T9g/linuxthreads/queue.h&q=pthread_cond_timedwait&exact_package=http://ftp.gnu.org/gnu/glibc/glibc-linuxthreads-2.1.1.tar.gz
>
> If this is so then it should affect the proposed fixes.
> For example waiting with timeout should be avoided, no?
>
> Nir
>
> 2010/3/16 David Beazley 
>
>> Python doesn't use a pthreads mutex for the GIL.It has always used a
>> binary semaphore implemented with condition variables (or just a pthreads
>> semaphore if available).The reason the performance is so bad is
>> precisely due to the fact that it is using this implementation and the fact
>> that there *IS* a FIFO queue of threads (associated with the condition
>> variable).   The I/O performance problem with the new GIL is gets much worse
>> with many CPU-bound threads precisely because there is a FIFO queue
>> involved.   This has been covered in my past GIL presentations.
>>
>>
>> -Dave
>>
>>
>>
>> On Mar 16, 2010, at 5:52 AM, Kristján Valur Jónsson wrote:
>>
>> > How about attacking the original problem, then?
>> >
>> > The reason they thrash on pthreads implementation is that a pthreads
>> mutex is assumed to be a short-held resource.  Therefore it will be
>> optimized in the following ways for multicore machines:
>> > 1) There is a certain amount of spinning done, to try to acquire it
>> before blocking
>> > 2) It will employ un-fair tactics to avoid lock-convoying, meaning that
>> a thread coming in to acquire the mutex may get in before others that are
>> queued.  This is why "ticking" the GIL works so badly:  The thread that
>> releases the lock is usually the one that reaquires it even though others
>> may be waiting.  See e.g.
>> http://www.bluebytesoftware.com/blog/PermaLink,guid,e40c2675-43a3-410f-8f85-616ef7b031aa.aspxfor
>>  a discussion of this (albeit on windows.)
>> >
>> > On Windows, this isn't a problem.  The reason is, that the GIL on
>> windows is implemented using Event objects that don't cut these corners.
>>  The Event provides you with a strict FIFO queue of objects waiting for the
>> event.
>> >
>> > If pthreads doesn't provide a synchronization primitive similar to that,
>> someone that doesn't thrash and has a true FIFO queue, it is possible to
>> construct such a thing using condition variables and critical sections.
>>  Perhaps the posix semaphore api is more appropriate in this case.
>> >
>> > By the way, this also shows another problem with (old) python.  There is
>> only one core locking primitive, the PyThread_type_lock.  It is being used
>> both as a critical section in the traditional sense, and also as this
>> sort-of-inverse lock that the GIL is.  In the modern world, where the
>> intended behaviour of these is quite different, there is no one-size-fits
>> all.  On windows in particular, the use of the Event object based lock is
>> not ideal for other uses than the GIL.
>> >
>> >
>> > In the new GIL, there appear to be several problems:
>> > 1) There is no FIFO queue of threads wanting the queue, thus thread
>> scheduling becomes non-deterministic
>> > 2) The "ticking" of the GIL is now controled by a condition variable
>> timeout.  There appears to be no way to prevent many su

Re: [Python-Dev] "Fixing" the new GIL

2010-03-16 Thread Nir Aides
Hello Dave,

The following documentation suggests ordering in Linux is not FIFO:
http://www.opengroup.org/onlinepubs/95399/functions/pthread_cond_timedwait.html#tag_03_518_08_06
"Threads waiting on mutexes and condition variables are selected to proceed
in an order dependent upon the scheduling policy rather than in some fixed
order (for example, FIFO or priority). Thus, the scheduling policy
determines which thread(s) are awakened and allowed to proceed."

Here is the code:
http://www.google.com/codesearch/p?hl=en#5ge3gHPB4K4/gnu/glibc/glibc-linuxthreads-2.1.1.tar.gz%7CaeB7Uqo7T9g/linuxthreads/queue.h&q=pthread_cond_timedwait&exact_package=http://ftp.gnu.org/gnu/glibc/glibc-linuxthreads-2.1.1.tar.gz

If this is so then it should affect the proposed fixes.
For example waiting with timeout should be avoided, no?

Nir

2010/3/16 David Beazley 

> Python doesn't use a pthreads mutex for the GIL.It has always used a
> binary semaphore implemented with condition variables (or just a pthreads
> semaphore if available).The reason the performance is so bad is
> precisely due to the fact that it is using this implementation and the fact
> that there *IS* a FIFO queue of threads (associated with the condition
> variable).   The I/O performance problem with the new GIL is gets much worse
> with many CPU-bound threads precisely because there is a FIFO queue
> involved.   This has been covered in my past GIL presentations.
>
> -Dave
>
>
>
> On Mar 16, 2010, at 5:52 AM, Kristján Valur Jónsson wrote:
>
> > How about attacking the original problem, then?
> >
> > The reason they thrash on pthreads implementation is that a pthreads
> mutex is assumed to be a short-held resource.  Therefore it will be
> optimized in the following ways for multicore machines:
> > 1) There is a certain amount of spinning done, to try to acquire it
> before blocking
> > 2) It will employ un-fair tactics to avoid lock-convoying, meaning that a
> thread coming in to acquire the mutex may get in before others that are
> queued.  This is why "ticking" the GIL works so badly:  The thread that
> releases the lock is usually the one that reaquires it even though others
> may be waiting.  See e.g.
> http://www.bluebytesoftware.com/blog/PermaLink,guid,e40c2675-43a3-410f-8f85-616ef7b031aa.aspxfor
>  a discussion of this (albeit on windows.)
> >
> > On Windows, this isn't a problem.  The reason is, that the GIL on windows
> is implemented using Event objects that don't cut these corners.  The Event
> provides you with a strict FIFO queue of objects waiting for the event.
> >
> > If pthreads doesn't provide a synchronization primitive similar to that,
> someone that doesn't thrash and has a true FIFO queue, it is possible to
> construct such a thing using condition variables and critical sections.
>  Perhaps the posix semaphore api is more appropriate in this case.
> >
> > By the way, this also shows another problem with (old) python.  There is
> only one core locking primitive, the PyThread_type_lock.  It is being used
> both as a critical section in the traditional sense, and also as this
> sort-of-inverse lock that the GIL is.  In the modern world, where the
> intended behaviour of these is quite different, there is no one-size-fits
> all.  On windows in particular, the use of the Event object based lock is
> not ideal for other uses than the GIL.
> >
> >
> > In the new GIL, there appear to be several problems:
> > 1) There is no FIFO queue of threads wanting the queue, thus thread
> scheduling becomes non-deterministic
> > 2) The "ticking" of the GIL is now controled by a condition variable
> timeout.  There appears to be no way to prevent many such timeouts to be in
> progress at the same time, thus you may have an unnecessarily high rate of
> ticking going on.
> > 3) There isn't an immediate gil request made when an IO thread requests
> the gil back, only after an initial timeout.
> >
> > What we are trying to write here is a thread scheduler, and that is
> complex business.
> > K
> >
> >
> >
> >> -Original Message-
> >> From: python-dev-bounces+kristjan=ccpgames@python.org
> >> [mailto:python-dev-bounces+kristjan =
> ccpgames@python.org] On Behalf
> >> Of David Beazley
> >> Sent: 15. mars 2010 03:07
> >> To: python-dev@python.org
> >> Subject: Re: [Python-Dev] "Fixing" the new GIL
> >>
> >> happen to be performing CPU intensive work at the same time, it would
> >> be nice if they didn't thrash on multiple cores (the problem with the
> >> old GIL) and if I/O is
> >
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-a

Re: [Python-Dev] "Fixing" the new GIL

2010-03-14 Thread Nir Aides
inline:

On Sun, Mar 14, 2010 at 3:54 PM, Peter Portante
wrote:

>  On 3/14/10 7:31 AM, "Nir Aides"  wrote:
>
> There are two possible problems with Dave's benchmark:
>
> 1) On my system setting TCP_NODELAY option on the accepted server socket
> changes results dramatically.
>
> Could you document what you saw and explain how you think TCP_NODELAY makes
> a difference, including what kind of system you ran your tests and what the
> application was that demonstrates those dramatic results?
>

I first disabled the call to spin() but client running time remained around
30 seconds.
I then added TCP_NODELAY and running time dropped to a few dozen
milliseconds for the entire no-spin run.
The system is Ubuntu Karmic 64bit with latest revision of python 3.2.

2) What category of socket servers is dave's spin() function intended to
> simulate?
>
> What is the problem you are trying to get at with this question?
>
> Does Dave’ spin() function have to have a category? Or perhaps the question
> is, for these solutions, what category of application do they hurt? Perhaps
> we can look at the solutions as general, but consider their impact in
> various categories.
>

In Dave's code sample, spin() is loading the CPU regardless of
requests. This may demonstrate how short IO bound requests will behave while
the server is processing a long Python-algorithmic CPU intensive request, or
an ongoing CPU intensive task unrelated to incoming requests. However is
this typical for socket servers?

If you change the sample server code to start a CPU bound thread with work X
for each incoming request you will see different behavior.

There is still the question of latency - a single request which takes long
> time to process will affect the latency of other "small" requests. However,
> it can be argued if such a scenario is practical, or if modifying the GIL is
> the solution.
>
> Perhaps Dave already documented this effect in his visualizations, no?
>

Naturally. I did not imply otherwise. His analysis is brilliant.


> If a change is still required, then I vote for the simpler approach - that
> of having a special macro for socket code.
>
> What is that simpler approach? How would that special macro work?
>

The special macro for socket code is one of the alternatives proposed by
Antoine above.

However, thinking about it again, with this approach as soon as the new
incoming request tries to read a file, query the DB, decompress some data or
do anything which releases the GIL, it goes back to square one. no?

I remember there was reluctance in the past to repeat the OS scheduling
> functionality and for a good reason.
>
>
> In what ways do we consider the CPython interpreter to be different than
> another application that has multiple threads and contention for one
> resource? Perhaps we have a unique problem among all other user space
> applications. Perhaps we don’t.
>

I think a rule of thumb is to protect a resource (typically a data
structure), as tightly as possible, avoiding for example locking across
function calls, etc, if possible. In contrast CPython is "locked" the entire
time.

As for the behavior of the GIL, how are the proposed solutions repeating OS
> scheduling functionality?
>

Dave discussed this in his analysis.

Nir
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] "Fixing" the new GIL

2010-03-14 Thread Nir Aides
There are two possible problems with Dave's benchmark:

1) On my system setting TCP_NODELAY option on the accepted server socket
changes results dramatically.
2) What category of socket servers is dave's spin() function intended to
simulate?

In a server which involves CPU intensive work in response to a socket
request the behavior may be significantly different.
In such a system, high CPU load will significantly reduce socket
responsiveness which in turn will reduce CPU load and increase socket
responsiveness.

Testing with a modified server that reflects the above indicates the new GIL
behaves just fine in terms of throughput.
So a change to the GIL may not be required at all.

There is still the question of latency - a single request which takes long
time to process will affect the latency of other "small" requests. However,
it can be argued if such a scenario is practical, or if modifying the GIL is
the solution.

If a change is still required, then I vote for the simpler approach - that
of having a special macro for socket code.
I remember there was reluctance in the past to repeat the OS scheduling
functionality and for a good reason.

Nir


On Sat, Mar 13, 2010 at 11:46 PM, Antoine Pitrou wrote:

>
> Hello,
>
> As some of you may know, Dave Beazley recently exhibited a situation
> where the new GIL shows quite a poor behaviour (the old GIL isn't very
> good either, but still a little better). This issue is followed in
> http://bugs.python.org/issue7946
>
> This situation is when an IO-bound thread wants to process a lot of
> incoming packets, while one (or several) CPU-bound thread is also
> running. Each time the IO-bound thread releases the GIL, the CPU-bound
> thread gets it and keeps holding it for at least 5 milliseconds
> (default setting), which limits the number of individual packets which
> can be recv()'ed and processed per second.
>
> I have proposed two mechanisms, based on the same idea: IO-bound
> threads should be able to steal the GIL very quickly, rather than
> having to wait for the whole "thread switching interval" (again, 5 ms
> by default). They differ in how they detect an "IO-bound threads":
>
> - the first mechanism is actually the same mechanism which was
>  embodied in the original new GIL patch before being removed. In this
>  approach, IO methods (such as socket.read() in socketmodule.c)
>  releasing the GIL must use a separate C macro when trying to get the
>  GIL back again.
>
> - the second mechanism dynamically computes the "interactiveness" of a
>  thread and allows interactive threads to steal the GIL quickly. In
>  this approach, IO methods don't have to be modified at all.
>
> Both approaches show similar benchmark results (for the benchmarks
> that I know of) and basically fix the issue put forward by Dave Beazley.
>
> Any thoughts?
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-02-02 Thread Nir Aides
Seems the problem under discussion is already taken care of in Python.
Possibly remains to verify that the logic described below does not possibly
generate deadlocks.

>From the Python docs: http://docs.python.org/c-api/init.html
"Another important thing to note about threads is their behaviour in the
face of the C fork() call. On most systems with fork(), after a process
forks only the thread that issued the fork will exist. That also means any
locks held by other threads will never be released. Python solves this
for os.fork() by acquiring the locks it uses internally before the fork, and
releasing them afterwards. In addition, it resets any Lock Objects in the
child. When extending or embedding Python, there is no way to inform Python
of additional (non-Python) locks that need to be acquired before or reset
after a fork. OS facilities such as posix_atfork() would need to be used to
accomplish the same thing. Additionally, when extending or embedding Python,
calling fork() directly rather than through os.fork() (and returning to or
calling into Python) may result in a deadlock by one of Python’s internal
locks being held by a thread that is defunct after the
fork. PyOS_AfterFork() tries to reset the necessary locks, but is not always
able to."

On Sat, Jan 30, 2010 at 12:14 PM, Pascal Chambon
wrote:

>
> *[...]
> What dangers do you refer to specifically? Something reproducible?
> -L*
>
>
> Since it's a race condition issue, it's not easily reproducible with normal
> libraries - which only take threading locks for small moments.
> But it can appear if your threads make good use of the threading module. By
> forking randomly, you have chances that the main locks of the logging module
> you frozen in an "acquired" state (even though their owner threads are not
> existing in the child process), and your next attempt to use logging will
> result in a pretty deadlock (on some *nix platforms, at least). This issue
> led to the creation of python-atfork by the way.
>
>
> Stefan Behnel a écrit :
>
> Stefan Behnel, 30.01.2010 07:36:
>
>
>  Pascal Chambon, 29.01.2010 22:58:
>
>
>  I've just recently realized the huge problems surrounding the mix of
> multithreading and fork() - i.e that only the main thread actually
> survived the fork(), and that process data (in particular,
> synchronization primitives) could be left in a dangerously broken state
> because of such forks, if multithreaded programs.
>
>
>  I would *never* have even tried that, but it doesn't surprise me that it
> works basically as expected. I found this as a quick intro:
> http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html
>
>  ... and another interesting link that also describes exec() usage in this
> context.
> http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
>
> Stefan
>
>
>
>  Yep, these links sum it up quite well.
> But to me it's not a matter of "trying" to mix threads and fork - most
> people won't on purpose seek trouble.
> It's simply the fact that, in a multithreaded program (i.e, any program of
> some importance), multiprocessing modules will be impossible to use safely
> without a complex synchronization of all threads to prepare the underlying
> forking (and we know that using multiprocessing can be a serious benefit,
> for GIL/performance reasons).
> Solutions to fork() issues clearly exist - just add a "use_forking=yes"
> attribute to subprocess functions, and users will be free to use the
> spawnl() semantic, which is already implemented on win32 platforms, and
> which gives full control over both threads and subprocesses. Honestly, I
> don't see how it will complicate stuffs, except slightly for the programmer
> which will have to edit the code to add spwawnl() support (I might help on
> that).
>
> Regards,
> Pascal
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com