[issue23269] Tighten-up search loops in sets

2015-01-26 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Applied Sirhiy's version of the patch but may switch to the j=0 version later.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-26 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0b2a3d764e63 by Raymond Hettinger in branch 'default':
Issue #23269:  Tighten search_loop in set_insert_clean()
https://hg.python.org/cpython/rev/0b2a3d764e63

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I have no strong preferences. This code is used very rarely and I have no any 
example which exposes performance differences. But how large the benefit of 
your patch? It doesn't make the C code more clean.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> May be worth to move this test outside of the loop as in the 
> set_lookup function.

The loop used to be this way but testing showed no benefit, so a while back I 
recombined it back to a j=0 start point and the code looked much nicer.  I 
don't really want to go back if I don't have to.

There may or may not be a microscopic benefit to moving the  leaq 9(%rcx), %rsi 
/ cmpq %rsi, %rax / jae L237 sequence before the table[i+0]->key == NULL check. 
 I don't still have access to VTune to verify that this highly predictable 
branch is essentially free.   I do have a preference at this point for the 
simpler code -- that will make the future optimization work easier (perhaps 
inlining the lookkey code into set_insert_key, set_discard, and set_contains).  
Also, looking at the disassembly of your patch, there is an additional cost for 
setting up the table[i+1] entry that wasn't there before (two new extra 
instructions:  leaq1(%rcx), %rsi; salq $4, %rsi; occur before the 
normal lookup and test sequence: leaq 0(%rbp,%rsi), %rdx; cmpq $0, (%rdx); je  
L237)

That said, if you think it is important to move the table[i+0] test outside the 
(i + LINEAR_PROBES <= mask) test, just say so and I'll apply your version of 
the patch instead of mine.  Otherwise, I prefer the cleaner code (both C and 
generated assembly) of my patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What timing results?

There is large chance that first tested entry is empty. May be worth to move 
this test outside of the loop as in the set_lookup function.

"i &= mask;" may be moved to the start of the loop.

--
Added file: http://bugs.python.org/file37858/tight2a.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Reopening this with a better patch that nicely tightens the inner-loop without 
an algorithmic change or any downside.

Currently, the entry computation adds a constant (i+j), masks it (&mask) to 
wrap on overflow, scales it to a table index (<< 4), and then does the key 
lookup (one memory access).   These steps are sequential (not parallizeable).

The patch moves the wrap-on-overflow decision out of the inner loop (adding a 
single, fast, predictable branch (i + LINEAR_PROBES > mask).   The inner-loop 
then simplifies to an add, fetch, and test (no masking and shifting).

The generated code on Clang is very tight:

LBB29_26:  
cmpq$0, (%rsi)   # entry->key == NULL
je  LBB29_30
incq%rdx # j++
addq$16, %rsi# entry++  (done in parallel with the incq)
cmpq$9, %rdx # j <= LINEAR_PROBES
jbe LBB29_26

On gcc-4.9, the loop is unrolled and we get even better code:

leaq(%rbx,%rsi), %rdx   # entry = table[i]
cmpq$0, (%rdx)  # entry.key == NULL
je  L223

leaq16(%rbx,%rsi), %rdx # entry = table[i+1];
cmpq$0, (%rdx)  # entry.key == NULL
je  L223

leaq32(%rbx,%rsi), %rdx
cmpq$0, (%rdx)
je  L223

--
resolution: rejected -> 
status: closed -> open
Added file: http://bugs.python.org/file37857/tight2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-20 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
resolution:  -> rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Either way the improvement doesn't look terrific, so I would suggest not to 
bother with this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Raymond Hettinger

Changes by Raymond Hettinger :


Added file: http://bugs.python.org/file3/measure_build_set.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Raymond Hettinger

Changes by Raymond Hettinger :


Added file: http://bugs.python.org/file37776/limit2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Patch timings give inconsistent results.  GCC-4.9 generates faster code and 
CLang generates slower code :-(

--
Added file: http://bugs.python.org/file37775/timings.txt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +pitrou, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23269] Tighten-up search loops in sets

2015-01-19 Thread Raymond Hettinger

New submission from Raymond Hettinger:

First draft of patch to switch from a table[(i+j)&mask] style of entry 
calculation to an entry++ style.  The entry computation simplifies from 
add/shl4/and/lea to a single add16.  To do this, the linear probes are limited 
to the length of table rather than wrapping-around.

I haven't timed this one yet (have just looked at the assembly) or worked 
through the implications.  The new approach is a little less attractive but may 
be easier to reason about than the mask wrap-around.  Also, it disadvantages 
small sets where the wrap-around property assured that the linear probes would 
always fine a hit without any entry being checked more than once.  There is a 
extra setup time before the linear probes to compute the limit.  Also, since 
the loop size is now variable instead of fixed, the inner loop for 
set_insert_clean() no longer gets unrolled by either GCC or CLANG.

With the change, the inner loop of set_insert_clean() is very tight:

L436:
addq$16, %rax   #, entry
entry < limit
cmpq%rdx, %rax  # limit, entry
ja  L399#,
entry->key == NULL
cmpq$0, (%rax)  #,* entry
jne L436 

The code for lookkey() is also tight (though I can't tell why the entry->key 
gets loaded from memory twice):

L102:
cmpq%r15, %r13  # tmp170, D.12706  entry->key == dummy
jne L103#,
testq   %r12, %r12  # freeslot
cmove   %r14, %r12  # entry -> freeslot
L103:
addq$16, %r14   #, entryentry++ 
cmpq%r14, %rbx  # entry, limit
jb  L146#,
movq(%r14), %r13# MEM[base: entry_65, offset: 0B], D.12706
testq   %r13, %r13  # D.12706   entry->key == NULL
je  L147#,
cmpq%rbp, 8(%r14)   # hash, MEM[base: entry_104, offset: 8B]
je  L148#,  entry->hash == hash
?   movq(%r14), %r13# MEM[base: entry_104, offset: 0B], D.12706
jmp L102#

--
assignee: rhettinger
components: Interpreter Core
files: limit.diff
keywords: patch
messages: 234308
nosy: rhettinger
priority: low
severity: normal
stage: patch review
status: open
title: Tighten-up search loops in sets
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file37773/limit.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com