[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-03-03 Thread Mark Shannon


Change by Mark Shannon :


--
pull_requests: +29777
pull_request: https://github.com/python/cpython/pull/31659

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-03-01 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-03-01 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c by Inada Naoki in branch 
'main':
bpo-46845: Reduce dict size when all keys are Unicode (GH-31564)
https://github.com/python/cpython/commit/9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

I added _PyDict_FromItems() to the PR.
It checks that all keys are Unicode or not before creating dict.
_PyDict_NewPresized() just returns general-purpose dict. But it isn't used from 
CPython core. It is just kept for compatibility (for Cython).

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{"k1":1, "k2":2, 
"k3":3, "k4":4, "k5":5, "k6":6}'
/home/inada-n/work/python/cpython/python: . 198 ns +- 5 ns
/home/inada-n/work/python/dict-compact/python: . 213 ns +- 
6 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 198 ns +- 5 ns -> 
[/home/inada-n/work/python/dict-compact/python] 213 ns +- 6 ns: 1.07x slower
```

Overhead of checking keys types is not so large.
Additionally, we can reduce some code from ceval.c.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-26 Thread Inada Naoki


Inada Naoki  added the comment:

In most case, first PyDict_SetItem decides which format should be used.

But _PyDict_NewPresized() can be a problem. It creates a hash table before 
inserting the first key, when 5 < (expected size) < 87382.

In CPython code base, _PyDict_NewPresized() is called from three places:

1. call.c: Building kwargs dict -- all key should be Unicode.
2. ceval.c: BUILD_MAP and BUILD_CONST_KEY_MAP -- there is no guarantee that all 
keys are Unicode.


Current pull request assumes the dict keys are unicode-only key. So building 
dict from non-Unicode keys become slower.

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: . 328 ns +- 
6 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 328 ns +- 6 ns: 1.41x slower
```

There are some approaches to fix this problem:

1. Don't use _PyDict_NewPresized() in BUILD_MAP, BUILD_CONST_KEY_MAP

```
$ ./python -m pyperf timeit --compare-to ../cpython/python -- '{(1,2):3, 
(4,5):6, (7,8):9, (10,11):12, (13,14):15, (16,17):18}'
/home/inada-n/work/python/cpython/python: . 233 ns +- 1 ns
/home/inada-n/work/python/dict-compact/python: . 276 ns +- 
1 ns

Mean +- std dev: [/home/inada-n/work/python/cpython/python] 233 ns +- 1 ns -> 
[/home/inada-n/work/python/dict-compact/python] 276 ns +- 1 ns: 1.18x slower
```

I think this performance regression is acceptable level.

2. Add an argument `unicode` to _PyDict_NewPresized(). -- Breaks some 3rd party 
codes using internal APIs.
3. Add a new internal C API such that _PyDict_NewPresizedUnicodeKey(). -- Most 
conservative.
4. Add a new internal C API that creates dict form keys and values for extreme 
performance, like this:

// Create a new dict from keys and values.
// Items are received as `{keys[i*keys_offset]: values[i*values_offset] for i 
in range(length)}`.
// When distinct=1, this function skips checking duplicated keys.
// So pass distinct=1 unless you can guarantee that there is no duplicated keys.
PyObject *
PyDict_FromKeysAndValues(PyObject **keys, Py_ssize_t keys_offset, PyObject 
**values, Py_ssize_t values_offset, Py_ssize_t lenghh, int distincit)
{
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-25 Thread Inada Naoki


Inada Naoki  added the comment:

>
>
> Do you propose to
> 1. Only use StringKeyDicts when non-string keys are not possible?  (Where
> would this be?)
> 2. Switch to a normal dict when a non-string key is added?  (But likely
> not switch back when the last non-string key is removed.)
> 3. Deprecate and remove the option to add non-string keys to namespace
> dicts?  (Proposed and rejected at least once as not gaining much.)
>
>
>

2. We already do such hack for key sharing dict.
And yes, deleting non string key doesn't switch back. d[0]=0; del d[0];
loop must be amortized O(1).
Only dict.clear() switches back.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-25 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

CPython, at least, allows users to insert non-string keys in namespace dicts 
that are conceptually string-key only.

>>> globals()[0] = 'zero'
>>> globals()[0]
'zero'
>>> vars()
{'__name__': '__main__', ..., 0: 'zero'}

[This is for consenting adults only, as it prevents sorting keys and 
string-only operations on keys.
>>> dir()
...
TypeError: '<' not supported between instances of 'int' and 'str']
 
Do you propose to
1. Only use StringKeyDicts when non-string keys are not possible?  (Where would 
this be?)
2. Switch to a normal dict when a non-string key is added?  (But likely not 
switch back when the last non-string key is removed.)
3. Deprecate and remove the option to add non-string keys to namespace dicts?  
(Proposed and rejected at least once as not gaining much.)

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-24 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +29686
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31564

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46845] dict: Use smaller entry for Unicode-key only dict.

2022-02-23 Thread Inada Naoki


New submission from Inada Naoki :

Currently, PyDictKeyEntry is 24bytes (hash, key, and value).

We can drop the hash from entry when all keys are unicode, because unicode 
objects caches hash already.

This will cause some performance regression on microbenchmark because dict need 
one more indirect access to compare hash value.

On the other hand, this will reduce some RAM usage. Additionally, unlike 
docstrings and annotations, this includes much **hot** RAM. It will make Python 
more cache efficient.

This is work in progress code: https://github.com/methane/cpython/pull/43
pypeformance result is in the PR too.

--
components: Interpreter Core
messages: 413892
nosy: Mark.Shannon, methane, rhettinger
priority: normal
severity: normal
status: open
title: dict: Use smaller entry for Unicode-key only dict.
type: performance
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com