New submission from David Beazley <d...@dabeaz.com>:

The attached patch makes two simple refinements to the new GIL implemented in 
Python 3.2.   Each is briefly described below.

1. Changed mechanism for thread time expiration

In the current implementation, threads perform a timed-wait on a condition 
variable.  If time expires and no thread switches have occurred, the currently 
running thread is forced to drop the GIL.

In the patch, timeouts are now performed by a special "GIL monitor" thread.  
This thread runs independently of Python and simply handles time expiration.  
Basically, it records the number of thread switches, sleeps for a specified 
interval (5ms), and then looks at the number of thread switches again.  If no 
switches occurred, it forces the currently running thread to drop the GIL.

With this monitor thread, it is no longer necessary to perform any timed 
condition variable waits.  This approach has a few subtle benefits.  First, 
threads no longer sit in a wait/timeout cycle when trying to get the GIL (so, 
there is less overhead).   Second, you get FIFO scheduling of threads.  When 
time expires, the thread that has been waiting the longest on the condition 
variable runs next.  Generally, you want this.

2. A very simple two-level priority mechanism

A new attribute 'cpu_bound' is added to the PyThreadState structure.  If a 
thread is ever forced to drop the GIL, this attribute is simply set True (1).  
If a thread gives up the GIL voluntarily, it is set back to False (0).  This 
attribute is used to set up simple scheduling (described next).

There are now two separate condition variables (gil_cpu_cond) and (gil_io_cond) 
that separate waiting threads according to their cpu_bound attribute setting.  
CPU-bound threads wait on gil_cpu_cond whereas I/O-bound threads wait on 
gil_io_cond. 

Using the two condition variables, the following scheduling rules are enforced:

   - If there are any waiting I/O bound threads, they are always signaled 
first, before any CPU-bound threads.
   - If an I/O bound thread wants the GIL, but a CPU-bound thread is running, 
the CPU-bound thread is immediately forced to drop the GIL.
   - If a CPU-bound thread wants the GIL, but another CPU-bound thread is 
running, the running thread is immediately forced to drop the GIL if its time 
period has already expired.

Results
-------
This patch gives excellent results for both the ccbench test and all of my 
previous I/O bound tests.  Here is the output:

== CPython 3.2a0.0 (py3k:80470:80497M) ==
== i386 Darwin on 'i386' ==

--- Throughput ---

Pi calculation (Python)

threads=1: 871 iterations/s.
threads=2: 844 ( 96 %)
threads=3: 838 ( 96 %)
threads=4: 826 ( 94 %)

regular expression (C)

threads=1: 367 iterations/s.
threads=2: 345 ( 94 %)
threads=3: 339 ( 92 %)
threads=4: 327 ( 89 %)

bz2 compression (C)

threads=1: 384 iterations/s.
threads=2: 728 ( 189 %)
threads=3: 695 ( 180 %)
threads=4: 707 ( 184 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 2 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 1 ms.)
CPU threads=3: 1 ms. (std dev: 1 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

--- I/O bandwidth ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 5850.9 packets/s.
CPU threads=1: 5246.8 ( 89 %)
CPU threads=2: 4228.9 ( 72 %)
CPU threads=3: 4222.8 ( 72 %)
CPU threads=4: 2959.5 ( 50 %)

Particular attention should be given to tests involving I/O performance.  In 
particular, here are the results of the I/O bandwidth test using the unmodified 
GIL:

--- I/O bandwidth ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 6007.1 packets/s.
CPU threads=1: 189.0 ( 3 %)
CPU threads=2: 19.7 ( 0 %)
CPU threads=3: 19.7 ( 0 %)
CPU threads=4: 5.1 ( 0 %)

Other Benefits
--------------
This patch does not involve any complicated libraries, platform specific 
functionality, low-level lock twiddling, or mathematically complex priority 
scheduling algorithms.  Emphasize: The code is simple.

Negative Aspects
----------------
This modification might introduce a starvation effect where CPU-bound threads 
never get to run if there is an extremely heavy load of I/O-bound threads 
competing for the GIL.

Is starvation a real problem or a theoretical problem?  Hard to say. Would need 
study.

----------
components: Interpreter Core
files: gil.patch
keywords: patch
messages: 104192
nosy: dabeaz
severity: normal
status: open
title: Refinements to Python 3 New GIL
type: performance
versions: Python 3.2
Added file: http://bugs.python.org/file17083/gil.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8532>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to