New submission from Raymond Hettinger:

I'm working on a patch for the lookkey() functions in Object/setobject.c.  The 
core idea is to follow the probe sequence as usual but to also check an 
adjacent entry for each probe (i.e. check &table[i & mask] as usual and then 
check &table[(i & mask) ^ 1] before going on the next probes which are 
scattered around memory).

On modern processors (anything in the last decade), this is likely to reduce 
the cost of hash collisions and reduce the number of cache lines fetched from 
memory.

+ Cache line benefit:  improves the odds that the adjacent probe will be on the 
same cache line, thereby reducing the total number of cache lines fetched.  
This benefit will work for set insertions as well as set lookups (a partial 
write to a cache line requires that a full cache line be read prior to writing, 
so it is helpful if we've just examined another slot on the same cache line).

+ Parallel execution:  because the second lookup has no data dependency on the 
first lookup, some of its instructions can be executed in parallel by the 
out-of-order engine.

+ Reduced loop overhead:  the second lookup doesn't require a new computation 
of the index *i* (we just do a XOR 1), a new perturb shift, a new masking of 
*i*, or a jump to the top of the loop.  In other words, it has all the usual 
benefits of loop unrolling.

- On the downside, even this partial unrolling when increase the total code 
size which has the negative effect of blowing other code out of the I-cache 
(typically 32kb).

? I'm unsure whether the additional if-statements will have a net positive or 
net negative effect on branch prediction rates (positive because each branch 
can be tracked separately or negative because additional branches use up more 
space in the branch prediction tables).  Once the patch is ready, I'll run 
CacheGrind to get a better sense of what is happening.

----------
components: Interpreter Core
messages: 195506
nosy: haypo, rhettinger
priority: normal
severity: normal
status: open
title: Reduce the cost of hash collisions for set objects
type: performance
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to