Mark Shannon <[email protected]> added the comment:
Jim Jewett wrote:
> Jim Jewett <[email protected]> added the comment:
>
> As of Feb 28, 2012, the PEP mentions an additional optimization of storing
> the values in an array indexed by (effectively) key insertion order, rather
> than key position. ("Alternative Implementation")
>
> It states that this would reduce memory usage for the values array by 1/3.
> 1/3 is a worst-case measurement; average is 1/2. (At savings of less than
> 1/3, the keys would resize, to initial savings of 2/3. And yes, that means
> in practice, the average savings would be greater than half, because the
> frequency of dicts of size N decreases with N.)
>
I presume you mean to allocate a values array of size == actual keys
rather than size == usable keys.
Making the values array smaller than the the number of usable keys
causes a number of issues:
1. The values_size will need to be kept in the dict, not in the keys,
which would make the dict larger for size 3 or 5, the same size for
size 2 or 4, but it would save memory for sizes 6-8 (see below).
So it might not save memory at all, it depends on the distribution of
the sizes of shared-key dicts.
2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
to quickly allocate values arrays. This would slow down the resizing
check, which is now done before insertion, so might be slower.
(not really a problem, but it might slow down inserts)
3. An extra check needs to be performed for each read to ensure the
index is in-bounds
(again not really a problem, but it might slow down reads)
Comparative sizes of values array (including size field):
Items Proposed Alternative Other Alternative Delta
(size == usuable) (size == keys (+1))
1 4 3 2 -1
2 4 3 3 0
3 4 3 4 +1
4 8 5 5 0
5 8 5 6 +1
6 16 10 7 -3
7 16 10 8 -2
8 16 10 9 -1
9 16 10 10 0
10 16 10 11 +1
12 32 21 13 -8
14 32 21 15 -6
The memory savings of the two alternative implementations are broadly
similar.
Clearly, either of the alternatives are attractive in terms of memory
use. I think it is definitely worth investigating further, but I would
like to get the proposed implementation into CPython first.
> It states that the keys table would need an additional "values_size" field,
> but in the absence of dummies, this is just ma_used.
values_size != ma_used
values_size is the size of the values array, not the number of values.
Don't forget deletions or the case where items are inserted in a
different order from that of another dict sharing the same keys.
Although there are no dummies, (key != NULL, value == NULL) is a legal
pair representing a missing or deleted value.
Cheers,
Mark.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13903>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com