[issue31105] Cyclic GC threshold may need tweaks

2017-08-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> As a fast approximation the default implementation of the __sizeof__ method 
> can be used

This doesn't handle further resizing of containers, though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31105] Cyclic GC threshold may need tweaks

2017-08-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

There is an existing protocol for estimating the size of allocated memory. 
__sizeof__. Python API sys.getsizeof() and C API _PySys_GetSizeOf(). But it is 
too heavy for using in GC.

As a fast approximation the default implementation of the __sizeof__ method can 
be used:

tp_basicsize + tp_itemsize * Py_SIZE(self)

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31105] Cyclic GC threshold may need tweaks

2017-08-03 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Yes, this is a potentially annoying limitation of the current scheme.

Note that it's not obvious how to proceed for resizable objects such as lists. 
Should the list inform the GC of every internal resizing? May a list resize 
trigger a GC run?

Also, a small container may keep alive a large non-GC object (think a 2-element 
list containing itself and a 1 GB bytestring).  How will the GC know?

--
nosy: +pitrou
type:  -> enhancement
versions:  -Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31105] Cyclic GC threshold may need tweaks

2017-08-02 Thread Armin Rigo

New submission from Armin Rigo:

The cyclic GC uses a simple and somewhat naive policy to know when it must run. 
 It is based on counting "+1" for every call to _PyObject_GC_Alloc().  Explicit 
calls to PyObject_GC_Del() are counted as "-1".  The cyclic GC will only be 
executed after the count reaches 700.  There is then a scheme with multiple 
generations, but the point is that nothing is done at all before 
_PyObject_GC_Alloc() has been called 700 times.

The problem is that each of these _PyObject_GC_Alloc() can be directly or 
indirectly responsible for a large quantity of memory.  Take this example:

while True:
l = [None] * 1000# 80 MB, on 64-bit
l[-1] = l
del l

This loop actually consumes 700 times 80 MB, which is unexpected to say the 
least, and looks like a very fast memory leak.  The same program on 32-bit 
architectures simply runs out of virtual address space and fails with a 
MemoryError---even if we lower the length of the list to 10**9/700 = 1428571.

The same problem exists whenever a single object is "large", we allocate and 
forget many such objects in sequence, and they are kept alive by a cycle.  This 
includes the case where the large object is not part of a cycle, but merely 
referenced from a cycle.  For examples of "large" objects with potentially low 
lifetimes, maybe more natural than large lists, would include bz2 objects (17MB 
each) or Numpy arrays.

To fix it, the basic idea would be to have the "large" allocations count for 
more than "+1" in _PyObject_GC_Alloc().  Maybe they would also need to decrease 
the count by the same amount in PyObject_GC_Del(), though that may be less 
important.  Still, I am unsure about how it could be implemented.  Maybe a new 
C API is needed, which could then be used by a few built-in types (lists, bz2 
objects, numpy arrays...) where the bulk of the memory allocation is not 
actually done by _PyObject_GC_Alloc() but by a separate call.  I am thinking 
about something like PyMem_AddPressure(size), which would simply increase the 
count by a number based on 'size'.

--
components: Interpreter Core
messages: 299656
nosy: arigo
priority: normal
severity: normal
status: open
title: Cyclic GC threshold may need tweaks
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com