[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-26 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-26 Thread Inada Naoki


Inada Naoki  added the comment:

We do not have *fill* since Python 3.6.
There is a `dk_nentries` instead. But when `insertion_resize()` is called, 
`dk_nentries` is equal to `USABLE_FRACTION(dk_size)` (dk_size is `1 << 
dk_log2_size` for now). So it is different from *fill* in the old dict.

I chose `dk_used*3` as GROWTH_RATE because it reserves more spaces when there 
are dummies than when there is no dummy, as I described in the same comment:

> In case of dict growing without deletion, dk_size is doubled for each resize 
> as current behavior.
> When there are deletion, dk_size is growing aggressively than Python 3.3 
> (used*2 -> used*3).  And it allows dict shrinking after massive deletions.

For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new 
dk_size is:

* used = 10 (dummy=0) -> 32 (31.25%)
* used = 9 (dummy=1)  -> 32 (28.125%)
(snip)
* used = 6 (dummy=4)  -> 32 (18.75%)
* used = 5 (dummy=5)  -> 16 (31.25%)
* used = 4 (dummy=6)  -> 16 (25%)
(snip)
* used = 2 (dummy=8)  -> 8 (25%)

As you can see, dict is more sparse when there is dummy than when there is no 
dummy, except used=5/dummy=5 case.

There may be a small room for improvement, especially for `used=5/dummy=5` 
case. But I am not sure it is worth enough to use more complex GROWTH_RATE than 
used*3.
Any good idea?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-25 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-25 Thread STINNER Victor


Change by STINNER Victor :


--
nosy:  -vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2022-01-24 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Should this have been "filled*3" rather than "used*3"?

The intent was to give a larger resize to dict that had a lot of dummy entries 
and a smaller resize to dicts without deletions.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-17 Thread INADA Naoki

Change by INADA Naoki :


--
resolution:  -> fixed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-17 Thread INADA Naoki

Change by INADA Naoki :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-17 Thread miss-islington

miss-islington  added the comment:


New changeset 902bb62d5af21526b68892a1032c63aa86ded247 by Miss Islington (bot) 
in branch '3.7':
bpo-33205: dict: Change GROWTH_RATE to `used*3` (GH-6350)
https://github.com/python/cpython/commit/902bb62d5af21526b68892a1032c63aa86ded247


--
nosy: +miss-islington

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-17 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6198

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-17 Thread INADA Naoki

INADA Naoki  added the comment:


New changeset 5fbc511f56688654a05b9eba23d140318bb9b2d5 by INADA Naoki in branch 
'master':
bpo-33205: dict: Change GROWTH_RATE to `used*3` (GH-6350)
https://github.com/python/cpython/commit/5fbc511f56688654a05b9eba23d140318bb9b2d5


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

The capacity of the dict is 2/3 of its hashtable size: dk_usable < 2/3 * 
dk_size.

Currently the dict grows if dk_usable > 1/4 * dk_size, and preserves the size 
if dk_usable < 1/4 * dk_size. Note that it it can grow twice if dk_usable > 1/2 
* dk_size.

With the proposed change the dict will grow only if dk_usable > 1/3 * dk_size, 
preserve the size if 1/6 * dk_size < dk_usable < 1/3 * dk_size, and shrink if 
dk_usable < 1/6 * dk_size. After growing once it will no need to grow again 
until the number of item be increased.

This LGTM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-05 Thread INADA Naoki

INADA Naoki  added the comment:

@Mark.Shannon, @rhettinger How do you think this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-02 Thread INADA Naoki

Change by INADA Naoki :


--
keywords: +patch
pull_requests: +6060
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-02 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-02 Thread INADA Naoki

INADA Naoki  added the comment:

# cap2.json is master (used*2 + dk_size/2)
# used3.json is patched (used*3)
$ ./python-master -m perf compare_to cap2.json used3.json  -G
Slower (2):
- rand_access(size=20): 2.67 ms +- 0.01 ms -> 2.80 ms +- 0.04 ms: 1.05x slower 
(+5%)
- rand_access(size=10): 2.70 ms +- 0.03 ms -> 2.80 ms +- 0.04 ms: 1.04x slower 
(+4%)

Faster (1):
- rand_access(size=50): 2.76 ms +- 0.06 ms -> 2.74 ms +- 0.02 ms: 1.01x faster 
(-1%)

Benchmark hidden because not significant (5): rand_access(size=2), 
rand_access(size=5), rand_access(size=100), rand_access(size=200), 
rand_access(size=500)

--
Added file: https://bugs.python.org/file47512/dict_rand.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33205] GROWTH_RATE prevents dict shrinking

2018-04-02 Thread INADA Naoki

New submission from INADA Naoki :

GROWTH_RATE is changed from (used*2) to (used*2 + dk_size/2) in #17563, at 
Python 3.4.
It was for avoid resizing dict for massive del/insert use case, by increasing 
possibility of overwriting DUMMY entry.

>From Python 3.6, there are no DUMMY entry.  While there are dummy keys, we 
>resize (repack) when entries are full.
So there are no benefit from "possibility of overwriting dummy entry".

(Of course, there are still benefit from slow repacking rate for new dict.
So I don't propose to change it back to (used*2), but (used*3).)


This GROWTH_RATE prevents dict is shrinked in insertion_resize().

For example, consider this dict:

>>> d = dict.fromkeys(range(10900))
>>> len(d)
10900
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
... del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008

`del d[i]` doesn't shrink the dict.  This is another issue (#32623).

Current dk_size is 16384 and entries length is dk_size * 2 // 3 = 10922.
So dictresize will called when next 923 entries are added.

New dk_size is round_up_to_power_of_two(922 + 16384/2) = 16384.
So dict is not shrinked!

>>> for i in range(923):
... d[i] = i
...
>>> sys.getsizeof(d)
295008

round_up_to_power_of_two(used + dk_size/2) means dict is shrinked only when 
used == 0.

I propose changing GROWTH_RATE again to `used*3` from `used*2 + dk_size/2`

In case of dict growing without deletion, dk_size is doubled for each resize as 
current behavior.
When there are deletion, dk_size is growing aggressively than Python 3.3 
(used*2 -> used*3).  And it allows dict shrinking after massive deletions.

>>> import sys
>>> d = dict.fromkeys(range(10900))
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
... del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008
>>> for i in range(923):
... d[i] = i
...
>>> sys.getsizeof(d)
36968


I want to backport this change to Python 3.7.  While it's beta3, "dict can't be 
shrinked unless empty" behavior looks resource usage bug to me.

--
components: Interpreter Core
messages: 314806
nosy: Mark.Shannon, Yury.Selivanov, eric.snow, inada.naoki, rhettinger, vstinner
priority: normal
severity: normal
status: open
title: GROWTH_RATE prevents dict shrinking
type: resource usage
versions: Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com