Jesús Cea Avión <j...@jcea.es> added the comment:

They are operations potentially slow, there are not realtime specifications.

My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE 
scripting tells us:

First, let's time the open:

"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
    self->ts = timestamp;
}

syscall::open*:return
/self->ts/
{
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
}
"""

The result, times in nanoseconds:

"""
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4497743  
            8192 |                                         17358    
           16384 |                                         1592     
           32768 |                                         2046     
           65536 |                                         812      
          131072 |                                         374      
          262144 |                                         0 
"""

Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings 
of hundred of microseconds.

Anyway, argueing with the GIL here is a nonissue, since the "open" is only done 
at object creation, and a sane program will only create a handful of devpoll 
objets.

Let's see now the write:

"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
    self->ts = timestamp;
}

syscall::open*:return
/self->ts/
{
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
}

"""

The results for a single descriptor registered:

"""
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4102701  
            1024 |                                         28514    
            2048 |                                         11269    
            4096 |                                         537      
            8192 |                                         245      
           16384 |                                         377      
           32768 |                                         193      
           65536 |                                         134      
          131072 |                                         71       
          262144 |                                         0  
"""

Most writes are really fast, half a microsecond, but we have sporadic latencies 
of a hundred of microseconds.

Re-registering 200 sockets per loop, I have:

"""
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |                                         50       
            2048 |                                         94       
            4096 |                                         157      
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1112314  
           16384 |@                                        22977    
           32768 |                                         1584     
           65536 |                                         842      
          131072 |                                         168      
          262144 |                                         0 
"""

Most writes takes around 8 microseconds.

So now the problem is to estimate how much time I need to release & readquire 
the GIL. For that, and while we don't have DTRACE integration in python (my 
next project, I hope), I change sourcecode to add a manual probe using the 
Solaris High Resolution timers, I get a median time of around 550 nanoseconds 
(half a microsecond), with a few spikes, when the GIL is not contested. 

So, unlocking the GIL is adding around half microsecond to the syscalls. This 
is fast, but comparable with the syscall actual duration. Liberating the GIL is 
doubling the time to register&unregister a socket (supposely we are doing a 
single socket per iteration, something realistic in real code).

Being realistic, any work we do with the descriptors (like actually reading or 
writing it) is going to make this 0.5 microsecond gain pointless. Freeing the 
GIL, we are protected of any kernel contention, and playing for the rules :-).

Anyway I am open to feedback.

PS: I also checked with GIL contention, and results are comparable. That is 
surprising, maybe related to the GIL improvements in 3.3. I haven't 
investigated the issue further.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6397>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to