[issue29304] dict: simplify lookup function

2017-06-18 Thread INADA Naoki

Changes by INADA Naoki :


--
pull_requests: +2323

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-06-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Dmitry, please open a new issue for your proposition. It changes visible 
behavior, while Inada's patch doesn't do this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-06-15 Thread Dmitry Rubanovich

Dmitry Rubanovich added the comment:

lookdict_index() (and the rest of the files in dictobject.c) are using 
unnecessarily complicated perturb mechanism.  And, in fact, it's slower than 
the simpler case.

Instead of this:

for (size_t perturb = hash;;) {
perturb >>= PERTURB_SHIFT;
i = mask & ((i << 2) + i + perturb + 1);


it should do this:

for (size_t perturb = hash;;) {
i = mask & ((i << 1) + perturb + 1);
perturb >>= PERTURB_SHIFT;


This would not only save an instruction (a minor issue), but it would also 
reduce collisions.

I've attached a file which calculates frequencies of collisions for 
demonstration purposes.  It shows that the calculation, as it stands right now, 
does not create a 1-1 mapping even on the 1st iteration through the loop.  
Moving PERTURB_SHIFT to the line before the calculation does reduce the density 
of the collision space.  But using the calculation, which I proposed, 
eliminates collisions on the 1st iteration completely and reduces it on most 
subsequent iterations.

--
components: +Interpreter Core
nosy: +Dmitry Rubanovich
type:  -> enhancement
versions: +Python 3.7
Added file: http://bugs.python.org/file46952/collisions_count.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-26 Thread INADA Naoki

INADA Naoki added the comment:

dict-refactoring-3.patch:

$ ../python.default -m perf compare_to default.json patched2.json -G 
--min-speed=2
Slower (7):
- scimark_lu: 422 ms +- 35 ms -> 442 ms +- 11 ms: 1.05x slower (+5%)
- logging_silent: 736 ns +- 7 ns -> 761 ns +- 21 ns: 1.03x slower (+3%)
- scimark_sor: 482 ms +- 8 ms -> 494 ms +- 7 ms: 1.03x slower (+3%)
- meteor_contest: 200 ms +- 2 ms -> 205 ms +- 2 ms: 1.02x slower (+2%)
- unpickle: 32.2 us +- 0.4 us -> 32.9 us +- 0.5 us: 1.02x slower (+2%)
- unpickle_pure_python: 829 us +- 13 us -> 848 us +- 14 us: 1.02x slower (+2%)
- scimark_sparse_mat_mult: 8.71 ms +- 0.32 ms -> 8.89 ms +- 0.13 ms: 1.02x 
slower (+2%)

Faster (8):
- unpack_sequence: 132 ns +- 2 ns -> 123 ns +- 2 ns: 1.07x faster (-7%)
- call_simple: 14.3 ms +- 0.5 ms -> 13.4 ms +- 0.3 ms: 1.07x faster (-6%)
- call_method: 15.1 ms +- 0.1 ms -> 14.5 ms +- 0.2 ms: 1.04x faster (-4%)
- mako: 40.7 ms +- 0.5 ms -> 39.6 ms +- 0.5 ms: 1.03x faster (-3%)
- scimark_monte_carlo: 266 ms +- 7 ms -> 258 ms +- 6 ms: 1.03x faster (-3%)
- chameleon: 30.4 ms +- 0.4 ms -> 29.6 ms +- 0.4 ms: 1.03x faster (-3%)
- xml_etree_parse: 319 ms +- 11 ms -> 312 ms +- 15 ms: 1.02x faster (-2%)
- pickle_pure_python: 1.28 ms +- 0.03 ms -> 1.26 ms +- 0.02 ms: 1.02x faster 
(-2%)

# microbench

$ ./python -m perf timeit --compare-to=`pwd`/python.default -s 'r=range(1000)' 
-- '{k:k for k in r}'   

python.default: . 60.0 us +- 0.3 us
python: . 61.7 us +- 0.4 us

Median +- std dev: [python.default] 60.0 us +- 0.3 us -> [python] 61.7 us +- 
0.4 us: 1.03x slower (+3%)

$ ./python -m perf timeit --compare-to=`pwd`/python.default -s 'ks=[str(k) for 
k in range(1000)]; d={k:k for k in ks}' -- 'for k in ks: d[k]'  

python.default: . 37.1 us +- 0.9 us
python: . 37.7 us +- 0.9 us

Median +- std dev: [python.default] 37.1 us +- 0.9 us -> [python] 37.7 us +- 
0.9 us: 1.02x slower (+2%)

Hmm, 3% slower?
I'll rerun benchmarks with PGO+LTO build.

--
Added file: http://bugs.python.org/file46424/dict-refactoring-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-26 Thread INADA Naoki

INADA Naoki added the comment:

>  I think the patch should not be pushed without such analysis. Perhaps 
> Raymond will found a time to do this.

Ok, I won't push until expert's LGTM.

> One possible optimization is removing freeslot completely. because:
> 
> * freeslot is used only when inserting. finding is more important.
> * insertdict() can find dummy quickly, only looking dk_indices.
>
> But before trying optimization,

I found this is not only optimization, but also refactoring.
There is function for finding insertion point already, and insertdict()
and PyDict_SetDefault() use it in some cases.
So this can be done without adding any code.

+ ../python.default -m perf compare_to default.json patched.json -G 
--min-speed=2
Slower (5):
- logging_silent: 736 ns +- 7 ns -> 758 ns +- 15 ns: 1.03x slower (+3%)
- unpickle: 32.2 us +- 0.4 us -> 33.0 us +- 0.3 us: 1.03x slower (+3%)
- unpickle_pure_python: 829 us +- 13 us -> 849 us +- 13 us: 1.02x slower (+2%)
- meteor_contest: 200 ms +- 2 ms -> 204 ms +- 2 ms: 1.02x slower (+2%)
- scimark_sor: 482 ms +- 8 ms -> 493 ms +- 9 ms: 1.02x slower (+2%)

Faster (6):
- unpack_sequence: 132 ns +- 2 ns -> 123 ns +- 0 ns: 1.07x faster (-7%)
- call_simple: 14.3 ms +- 0.5 ms -> 13.7 ms +- 0.3 ms: 1.05x faster (-5%)
- sqlite_synth: 10.3 us +- 0.3 us -> 9.91 us +- 0.21 us: 1.03x faster (-3%)
- mako: 40.7 ms +- 0.5 ms -> 39.8 ms +- 0.3 ms: 1.02x faster (-2%)
- xml_etree_parse: 319 ms +- 11 ms -> 311 ms +- 14 ms: 1.02x faster (-2%)
- chameleon: 30.4 ms +- 0.4 ms -> 29.8 ms +- 0.3 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (53): ...

--
Added file: http://bugs.python.org/file46423/dict-refactoring-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I agreed that the duplication was made for reasons, but these reasons can be 
not valid or less weighty now, after a number of changes made last years like 
shared keys or compact dicts. An overhead of additional levels of indirection 
and complex code of dk_get_index() can be much larger than the benefit of code 
duplicating.

I think neither macrobenchmarks nor microbenchmarks would not show the true 
effect of this change. The code should be analyzed with more precise tools, 
similarly to how the set implementation was tuned. I think the patch should not 
be pushed without such analysis. Perhaps Raymond will found a time to do this.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-26 Thread INADA Naoki

INADA Naoki added the comment:

BTW, perturb shift uses (i << 2) + i, instead of i*5.
I feel it doesn't make sense in 21th century.  I'll change it too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-26 Thread INADA Naoki

INADA Naoki added the comment:

Digging history, duplicated code is introduced here. (1997-01-17)

https://github.com/python/cpython/commit/99304174680d4c724476dad300ae7fc638842bf0#diff-2131209d0deb0e50c93a88ec6c7b0d52

/* Optimizations based on observations by Jyrki Alakuijala
   (paraphrased):
   - This routine is very heavily used, so should be AFAP
   (As Fast As Possible).
   - Most of the time, the first try is a hit or a definite
   miss; so postpone the calculation of incr until we know the
   first try was a miss.
   - Write the loop twice, so we can move the test for
   freeslot==NULL out of the loop.
   - Write the loop using pointer increments and comparisons
   rather than using an integer loop index.
   Note that it behooves the compiler to calculate the values
   of incr*sizeof(*ep) outside the loops and use this in the
   increment of ep.  I've reduced the number of register
   variables to the two most obvious candidates.

At the time, there was a single lookdict function, and three tries.

* The comment said the first try was for skipping `inc` calculation.
  While `inc` had been removed already, I think this implies skipping
  freeslot==NULL test.

* After first try, there are two loop for skipping freeslot==NULL test
  until first dummy found.  This optimization looks gone.


As I said above, lookdict_unicode_nodummy and lookdict_split only search
from table without dummies.
And lookdict_unicode() and lookdict() are not so important lookmapping() was in 
1997,
duplicated code only for skipping one freeslot==NULL doesn't make sense.

One possible optimization is removing freeslot completely. because:

* freeslot is used only when inserting. finding is more important.
* insertdict() can find dummy quickly, only looking dk_indices.

But before trying optimization, I suggest to remove duplicated code first.
Macro bench doesn't show significant difference at least.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread INADA Naoki

INADA Naoki added the comment:

I've attached wrong file.  This patch is first version I wanted to post.
It dedupe all five functions.

--
Added file: http://bugs.python.org/file46325/dict-refactor.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread INADA Naoki

Changes by INADA Naoki :


Removed file: http://bugs.python.org/file46324/dictlook-refactoring.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread INADA Naoki

INADA Naoki added the comment:

Raymond, I understand your worries.  I won't commit this until I do more 
precise survey.

But why I trying this is not only I find duplicated code.
I think benefit of this technique is reduced by historical reason.


In Python 2.7, there are only two lookup functions: lookdict and 
lookdict_string.
Both functions support dict containing dummy entry.

In the first comparing part in the lookdict_string():

if (ep->me_key == NULL || ep->me_key == key)
return ep;
if (ep->me_key == dummy)
freeslot = ep;
else {
if (ep->me_hash == hash && _PyString_Eq(ep->me_key, key))
return ep;
freeslot = NULL;
}

And similar code in same function but in loop:

if (ep->me_key == NULL)
return freeslot == NULL ? ep : freeslot;
if (ep->me_key == key
|| (ep->me_hash == hash
&& ep->me_key != dummy
&& _PyString_Eq(ep->me_key, key)))
return ep;
if (ep->me_key == dummy && freeslot == NULL)
freeslot = ep;

First part can ignore freeslot variable is NULL or not.  But second lookup 
can't.
Since most lookup is done at first try, this technique may had significant 
effect.


But for now, we're having five lookdict functions (if we include 
lookdict_index).

Three of them (lookdict_unicode_nodummy, lookdict_split, and lookdict_index) 
cares only dict without dummies.
They don't have freeslot variable.  So first part and second part is almost 
same.
Performance benefit from this technique may be negligible.

About remaining two functions (lookdict_unicode and lookdict), this technique 
may have still effect.
But performance of them are not so important now, compared with Python 2.7.
They were used for all name lookup in Python 2.7, but they are fallback of 
lookdict_split and lookdict_unicode_nodummy now.

On the other hand, having 2.5x (2 -> 5) lookdict function means maintenance 
cost of duplicated code is increased.


At first, I'll start three functions which doesn't take care of dummies.
I think there are almost zero performance regression.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I worry you guys are rapidly sloshing around and churning code that was 
developed slowly and methodically over many years.  A lot of people have looked 
at an tweaked the dictionary code, but now it seems like there is a 
rewrite-everything festival.  When looking at potential changes, please 
consider that the people who came before you knew what they were doing.  If 
there is a pre-loop step that could have been folded into a single loop to save 
a little code, it isn't there because the people who wrote it are daft.  That 
expansion was made for a reason.

Even if you get an improved timing here or there, consider that it may vary 
from compiler to compiler, from os to os, and may vary widely across different 
kinds of applications.  Branch prediction and cache effect issues don't readily 
show up in simple timing loops.

Particularly as a new core dev, I recommend resisting the urge to rewrite 
everything you touch.  That will only destabilize Python.  In the past couple 
of years, two or three devs have churned an enormous amount of code.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread Xiang Zhang

Xiang Zhang added the comment:

Ahh, I got the same idea before. But then I persuaded myself that the first 
"lookup()" was deliberately extracted from the loop since it's highly possible 
there is no collision and you can hit the result (empty or not) the first time.

--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29304] dict: simplify lookup function

2017-01-17 Thread INADA Naoki

New submission from INADA Naoki:

All lookdict* functions are implemented like pseudo language:

```
lookup()
if not collision:
return result

while True:
perturb_shift()
lookup()
if not collision:
return result
```

This patch changes it as:

```
while True:
lookup()
if not collision:
return result
perturb_shift()
```

It removes 100 lines of code. Good. But how about performance?

When this patch is applied to 4a534c45bbf6:

```
$ ../python.patched -m perf compare_to default.json patched2.json -G 
--min-speed=2
Slower (4):
- xml_etree_generate: 271 ms +- 6 ms -> 283 ms +- 9 ms: 1.04x slower (+4%)
- nqueens: 263 ms +- 4 ms -> 272 ms +- 3 ms: 1.04x slower (+4%)
- scimark_monte_carlo: 272 ms +- 10 ms -> 280 ms +- 14 ms: 1.03x slower (+3%)
- scimark_lu: 435 ms +- 23 ms -> 446 ms +- 32 ms: 1.03x slower (+3%)

Faster (7):
- call_method: 15.2 ms +- 0.2 ms -> 14.7 ms +- 0.4 ms: 1.04x faster (-4%)
- call_simple: 14.4 ms +- 0.2 ms -> 13.9 ms +- 0.3 ms: 1.04x faster (-4%)
- xml_etree_iterparse: 227 ms +- 9 ms -> 219 ms +- 7 ms: 1.04x faster (-3%)
- scimark_sor: 527 ms +- 10 ms -> 510 ms +- 11 ms: 1.03x faster (-3%)
- call_method_slots: 14.7 ms +- 0.5 ms -> 14.3 ms +- 0.2 ms: 1.03x faster (-3%)
- genshi_text: 90.2 ms +- 1.1 ms -> 87.8 ms +- 1.1 ms: 1.03x faster (-3%)
- django_template: 403 ms +- 5 ms -> 394 ms +- 4 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (53): 2to3, ...
```

And when this patch applied to 1a97b10cb420 :

```
$ ../python.patched -m perf compare_to default.json patched.json -G 
--min-speed=2   

Slower (6):
- call_simple: 13.5 ms +- 0.5 ms -> 14.4 ms +- 0.4 ms: 1.07x slower (+7%)
- xml_etree_generate: 270 ms +- 6 ms -> 287 ms +- 5 ms: 1.06x slower (+6%)
- xml_etree_process: 240 ms +- 6 ms -> 247 ms +- 4 ms: 1.03x slower (+3%)
- regex_compile: 429 ms +- 3 ms -> 440 ms +- 5 ms: 1.03x slower (+3%)
- call_method_unknown: 16.1 ms +- 0.2 ms -> 16.5 ms +- 0.3 ms: 1.02x slower 
(+2%)
- logging_simple: 31.2 us +- 0.4 us -> 32.0 us +- 0.3 us: 1.02x slower (+2%)

Faster (8):
- genshi_text: 90.6 ms +- 1.4 ms -> 87.6 ms +- 1.2 ms: 1.03x faster (-3%)
- scimark_sor: 513 ms +- 11 ms -> 497 ms +- 12 ms: 1.03x faster (-3%)
- genshi_xml: 200 ms +- 2 ms -> 194 ms +- 2 ms: 1.03x faster (-3%)
- unpickle_pure_python: 857 us +- 21 us -> 835 us +- 13 us: 1.03x faster (-3%)
- python_startup_no_site: 9.95 ms +- 0.02 ms -> 9.74 ms +- 0.02 ms: 1.02x 
faster (-2%)
- json_dumps: 29.7 ms +- 0.4 ms -> 29.1 ms +- 0.4 ms: 1.02x faster (-2%)
- xml_etree_iterparse: 225 ms +- 9 ms -> 220 ms +- 5 ms: 1.02x faster (-2%)
- chameleon: 31.1 ms +- 0.3 ms -> 30.5 ms +- 0.5 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (50): 2to3, ...
```

I can't see any stable and significant performance regression.
I'll try to create some micro benchmarks.

--
files: dictlook-refactoring.patch
keywords: patch
messages: 285695
nosy: inada.naoki
priority: normal
severity: normal
status: open
title: dict: simplify lookup function
Added file: http://bugs.python.org/file46324/dictlook-refactoring.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com