[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2016-12-15 Thread INADA Naoki

INADA Naoki added the comment:

> Fixed fromkeys() in Py2.7.  Stills needs to be forward ported to 3.4/3.5.

dict.fromkeys() is fixed already in 3.6+.

dict(l) doesn't presize the dict.  If it's desirable, let's create a new issue.

--
nosy: +inada.naoki
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Fixed fromkeys() in Py2.7.  Stills needs to be forward ported to 3.4/3.5.

--
assignee: rhettinger - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Roundup Robot

Roundup Robot added the comment:

New changeset cd0706499812 by Raymond Hettinger in branch '2.7':
Issue #23971:  Fix underestimated presizing in dict.fromkeys()
https://hg.python.org/cpython/rev/cd0706499812

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

An example of uncollectable loop if tp_clear is not implemented:

class A:
@property
def f(self): pass

A.f.__doc__ = (A.f,)

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
Removed message: http://bugs.python.org/msg243078

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

-if (dictresize(mp, Py_SIZE(seq))) {
+if (dictresize(mp, Py_SIZE(seq) / 2 * 3)) {


If Py_SIZE(seq) is 1, dictresize argument is 0.

Why not wryte the expression as Py_SIZE(seq) * 3 / 2? It never overflows, 
because Py_SIZE(seq) is the size of allocated array of pointers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Mark Dickinson

Mark Dickinson added the comment:

From the patch:

-if (dictresize(mp, Py_SIZE(seq))) {
+if (dictresize(mp, Py_SIZE(seq) / 2 * 3)) {

Isn't there a risk of signed overflow here?  The dictresize function has an 
`assert(minused = 0)`, which is going to fail for values of `Py_SIZE(seq)` 
that are close to `PY_SSIZE_T_MAX` in the common case where signed overflow 
just wraps modulo the appropriate power of 2 (though it's undefined behaviour, 
so in theory it *could* do anything).

--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Mark Dickinson

Mark Dickinson added the comment:

 It never overflows, because Py_SIZE(seq) is the size of allocated array of 
 pointers.

Ah, good point.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

No, I just asking. If the minimum dict size is 8 and there is no special case 
for empty dicts, all is good to me.

But note, that x / 2 * 3 overflows as well as x * 3 / 2.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-05-13 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 If Py_SIZE(seq) is 1, dictresize argument is 0.

That doesn't matter.  The minimum dict size is 8.

 Why not write Py_SIZE(seq) * 3 / 2?

Because I prefer the / 2 * 3 style so I don't have to think about whether seq 
can overflow.  Do you really feel like bike-shedding this?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-18 Thread Gregory P. Smith

Gregory P. Smith added the comment:

fwiw, as for 2.7 i personally don't think I would change its behavior around 
this at this point.  make sure 3.5+ do something desirable.  (my link to 
dictobject.c above is from 2.7)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-18 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Won't we always consume the memory thanks to a memset(newtable, 0, ...) 
https://hg.python.org/cpython/file/df28044b7e14/Objects/dictobject.c#l654 ?

(also, i'm not sure if Windows is allocates mapped pages on demand as posix 
systems tend to)

--
nosy: +gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-17 Thread Matt Mackall

Matt Mackall added the comment:

We already presize the output dict and have for ages. The question here is what 
size to use. In the current state, we use twice as much memory and CPU as 
necessary quite often because we end up spilling and growing... even though we 
apparently intentionally sized the object to fit.

In the sparse case, if we allocate a 1GB dictionary and use two entries in it, 
we will have consumed 1GB of address space but 1kB of actual memory.

--
nosy: +Matt.Mackall

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-16 Thread Josh Rosenberg

Josh Rosenberg added the comment:

You shouldn't presize when the input is a potentially non-unique iterable. It 
makes sense to presize for inputs of type dict or set, but for a list, the 
assumption is that many duplicates will be present, since deduping is a common 
use case for dicts (less so with the advent of sets, but still common enough to 
support without massive overallocation of memory).

In short, dict([(1, 2)] * 100) should not involve allocating GB of memory 
for the dict up front.

--
nosy: +josh.r

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-15 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +rhettinger
type: behavior - performance

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-15 Thread Larry Hastings

New submission from Larry Hastings:

Matt Mackall brought this over to me today.

If list l has one million pairs and you do
  dict(l)
it creates the dict, then resizes it to a million elements, then starts adding 
name/value pairs.  But since it's sized to a million elements, it gets beyond 
its fill ratio comfort level of 666k and resizes.  And when you're up to a 
million elements, the resize is quite slow.  It should have internally resized 
the dict correctly from the beginning such that no resize was necessary.

The same is true for dict.fromkeys--passed in a list of 1m elements, it uses an 
initial size of 1m, not 1.5m.

Attached is a sample patch to change the behavior of both these operations so 
no resizes are needed during the insertions.  I haven't checked that it 
actually fixes the behavior, I just made the change and ran the regression 
test.  (I'm oversubscribed here at the sprints, and am kind of rushing this 
diff out just to get the conversation rolling.)

Benjamin: what do you think?  Would it be appropriate to make a change like 
this in 2.7?

Python 3 (or at least 3.5 trunk) is fancy here.  It starts with the minimum 
size of a dict, then iterates until the new size is  1.5*the  number of 
elements.  This means the dicts it creates are of the same size as they would 
be if we'd started with a minsize dict and inserted one element at a time.  
This might help minimize wear and tear on the small block allocator for smaller 
dicts.

BTW, the internal function _PyDict_NewPresize should really have taken the fill 
ratio into account too.  But, even though it's an internal-only function, it's 
probably too late to change its behavior.

--
components: Interpreter Core
messages: 241179
nosy: benjamin.peterson, larry, marmoute, mpm
priority: normal
severity: normal
stage: patch review
status: open
title: dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23971] dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio

2015-04-15 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
assignee:  - rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23971
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com