[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> I'm afraid nailing anything here is hard.  For example, I don't know what you 
> did to "measure memory", but if you're using stats reported by the OS, that's 
> fraught with dangers too. 

I am interposing a custom allocator to track all the used blocks in all used 
pools in all arenas and then monitor all the memmap/malloc allocations. I get 
the same results if I don't use obmalloc (setting the PYTHONMALLOC=malloc) env 
var, which is more straightforward as I can just interpose a malloc 
implementation using LD_PRELOAD or a simple custom allocator. I don't claim to 
be exact but is better than just checking the resident size.

I didn't want to use sys._debugmallocstats() so printing to stdout does not 
impact performance, as I was also checking that the code does not make the 
interpreter slower.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Tim Peters


Tim Peters  added the comment:

All right!  So, at a first look, "buffering" isn't an obvious disaster ;-)

I'm afraid nailing anything here is hard.  For example, I don't know what you 
did to "measure memory", but if you're using stats reported by the OS, that's 
fraught with dangers too.  Python's small-object allocator makes only weak 
attempts to return memory "to C", which in turn may or may not return memory to 
"the system".

One way to do better is to call `sys._debugmallocstats()` and stare at the 
output.  The "# bytes in allocated blocks" output line is an exact count of the 
number of bytes pymalloc is currently hanging on to for objects that have been 
allocated but not yet freed.  The point is that C - and the OS - have nothing 
to do with this value.  The downside:  objects > 512 bytes aren't included at 
all in this (pymalloc passes on requests for > 512 bytes to the system 
malloc(), and doesn't keep track of them).

Anyway, so far, so good!  Looks worth pursuing :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

I have attached two files to the issue. One of them (vanilla-vs-buffered) shows 
the mean memory profile of running test_asyncio 1000 times with the buffered GC 
and without the buffered GC. The other file shows the difference between both 
curves.

test_ascyncio is not a "production application" but is the longest and less 
specific test that we have. I have experimented with some tests in the 
pyperformance suite and they show some similitudes like some memory spikes seem 
to be smaller and less memory usage in flat regions after collections.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


Added file: https://bugs.python.org/file48807/vanilla-vs-buffered.png

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


Added file: https://bugs.python.org/file48808/diff-vanilla-buffered.png

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> Oh, I don't expect it to help appreciably - which is why I never did it ;-)  
> It's aimed at something different, I think, than what you're after:  reducing 
> the burden of cyclic gc on objects that would otherwise soon be reclaimed by 
> refcounting anyway.  But such stuff is going to get reclaimed "soon" 
> regardless, and saving it from getting scanned by gc at most once has limited 
> potential.

What I wanted to say is that this idea is still interesting to benchmark and 
experiment on its own, especially because is simple enough.

>You seem aimed more at reclaiming _cyclic_ trash sooner.  The hack I sketched 
>would only help with that if a cycle in part B became "theoretically dead" 
>before part A filled up enough to trigger another collection.  But I have no 
>guess as to the distribution of cycle lifetimes, other than that it's bound to 
>vary a lot across programs, and will be multi-modal.

That is right (I was also looking to solve the case in which an object is 
referred by something in an older generation, then is promoted and almost 
immediately the referent dies - or it was dead to begin with -).  I think 
evolving from your proposal into something in which the first generation is 
split into two "sub-generations" with the same threshold that is collected at 
the same time should not be very complicated.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Tim Peters


Tim Peters  added the comment:

Oh, I don't expect it to help appreciably - which is why I never did it ;-)  
It's aimed at something different, I think, than what you're after:  reducing 
the burden of cyclic gc on objects that would otherwise soon be reclaimed by 
refcounting anyway.  But such stuff is going to get reclaimed "soon" 
regardless, and saving it from getting scanned by gc at most once has limited 
potential.

You seem aimed more at reclaiming _cyclic_ trash sooner.  The hack I sketched 
would only help with that if a cycle in part B became "theoretically dead" 
before part A filled up enough to trigger another collection.  But I have no 
guess as to the distribution of cycle lifetimes, other than that it's bound to 
vary a lot across programs, and will be multi-modal.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> For example, break the youngest generation into parts A & B.  Newly tracked 
> items are added to part A.  When a collection occurs, only part B 
> participates from the youngest generation.  When the collection ends, part A 
> is renamed to part B, and part A is marked empty (just a handful of pointer 
> manipulations).

Well, this seems simple enough to fully implement just to try it out to 
benchmark a bit. I will start doing some experiments with it.

> Beyond that most objects "die young", which appears overwhelmingly truer than 
> not across almost all Python programs, I'm not sure anything else about 
> lifetime distribution is generally exploitable.

In theory, sub-generation steps will make use of "most object die young", 
specifically it will try to make it happen more often, as this issue can be 
summarized on "some objects that should die young don't because they were in a 
collection at the wrong time and now they need to wait even more".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Joannah Nanjekye


Joannah Nanjekye  added the comment:

> About "5 bits", no, we don't have 'em.  Even the relatively modest > ugliness 
> we have now makes it impossible to port Python to a word->addressed machine 
> (I'm not sure any still exist!).  Nothing in C >guarantees the last 2 or 3 
> bits of a pointer "are 0".  We're already >using two words to store 2 
> persistent pointers, one persistent flag, a >full-width transient reference 
> count, and two transient flags.

In view of this, +1 for avoiding the extra bits for age storage. The steps 
scheme will work best in this case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Joannah Nanjekye


Joannah Nanjekye  added the comment:

>
I suppose we would need to experiment, but for what I have seen I think two or 
three :) What do you think?


IMO, I think experimenting with two steps is good enough.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Tim Peters


Tim Peters  added the comment:

Long ago I thought about adding "a buffer in front of" CPython's cyclic gc:  
newly tracked items would be added there, and not participate in cyclic gc at 
all until a collection passed.

For example, break the youngest generation into parts A & B.  Newly tracked 
items are added to part A.  When a collection occurs, only part B participates 
from the youngest generation.  When the collection ends, part A is renamed to 
part B, and part A is marked empty (just a handful of pointer manipulations).

Seems a similar thrust, but a more extreme goal:  trying to maximize the number 
of new containers that can be reclaimed by refcounting alone without _ever_ 
being chased by cyclic gc.

Beyond that most objects "die young", which appears overwhelmingly truer than 
not across almost all Python programs, I'm not sure anything else about 
lifetime distribution is generally exploitable.

About "5 bits", no, we don't have 'em.  Even the relatively modest ugliness we 
have now makes it impossible to port Python to a word-addressed machine (I'm 
not sure any still exist!).  Nothing in C guarantees the last 2 or 3 bits of a 
pointer "are 0".  We're already using two words to store 2 persistent pointers, 
one persistent flag, a full-width transient reference count, and two transient 
flags.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> How many semi-spaces do you think of having in this generation? Most systems 
> have and recommend two.

I suppose we would need to experiment, but for what I have seen I think two or 
three :) What do you think?

> If it is a worse tradeoff to managing steps then I would suggest to stick to 
> the steps.

I think any memory increase that scales with the number of objects will be a 
big concern.

>  we need to add a step and manage it which is more complex than for example 
> when tracking ages, incase of a change we only change the age threshold which 
> looks simpler to implement.

Exactly, but if with "tracking ages" you mean store per-object ages then the 
previous concern applies :( Adding sub-generation steps should not be much more 
complicated IMHO.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Joannah Nanjekye


Joannah Nanjekye  added the comment:

>5 bit *per object* is a lot because it scales with the number of objects. It 
>will quickly obliterate >any gain that we get from some objects being 
>deallocated sooner (which is what we are trying >to achieve). Using 
>inter-generations has a constant cost that is negligible. 

I agree but this may not require us falling back to a two word header per 
object as was the original worry. If it is a worse tradeoff to managing steps 
then I would suggest to stick to the steps. Again in the end it is about what 
tradeoffs are willing to work with.

 >I am not sure what you mean with "changing the age >threshold for objects 
 >looks simpler than >creating and managing steps". If you mean changing >the 
 >generational threshold, I think that >does not fix the problem as it only 
 >delays or speeds up >collections but the problem is >scale-invariant with the 
 >number of objects.

I meant that for any need for increase in promotion delay, we need to add a 
step and manage it which is more complex than for example when tracking ages, 
incase of a change we only change the age threshold which looks simpler to 
implement.

>I will propose maybe to only do sub-steps for the first generation (as many 
>papers suggest >that is where the bigger gains are).

Correct, since the premise is we just want to delay promotion of young objects 
to an older generation so that they have enough time to die. The steps should 
reasonably be in the first generation.

How many semi-spaces do you think of having in this generation? Most systems 
have and recommend two.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado

Pablo Galindo Salgado  added the comment:

> I think we only need about 5 bits to store the age of each object ,  do we 
> really need to increase the object back to two word per object ?

5 bit *per object* is a lot because it scales with the number of objects. It 
will quickly obliterate any gain that we get from some objects being 
deallocated sooner (which is what we are trying to achieve). Using 
inter-generations has a constant cost that is negligible. 

> Am actually in favour of this Shaw’s steps scheme though it may also mean the 
> more delays we want, the more steps we need to create and manage while 
> changing the age threshold for objects looks simpler than creating and 
> managing steps.

I will propose maybe to only do sub-steps for the first generation (as many 
papers suggest that is where the bigger gains are). I am not sure what you mean 
with "changing the age threshold for objects looks simpler than creating and 
managing steps". If you mean changing the generational threshold, I think that 
does not fix the problem as it only delays or speeds up collections but the 
problem is scale-invariant with the number of objects.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Joannah Nanjekye

Joannah Nanjekye  added the comment:

> What threshold is this?

> This is the different thresholds for the generations that you can get using 
> gc.get_threshold(). >They are in relationship to the number of objects in 
> every generation (there are slightly different >rules for the latest 
> generation).

I think we are on the same frequency in terms of threshold. Promotion is 
currently based on number of objects accumulated in the generation if I 
interpreted your explanation right.


>I am not sure what you mean with "copy" here. Moving objects from generations 
>or spaces is >just updating the pointers of the linked lists. Objects are not 
>"copied" but "moved". 

Yes, in terms of implementation so am thinking there must be a cost in moving 
these objects. IIRC the GC handbook you referenced may have noted this too..I 
stand to be corrected though on this.

>The reason I am proposing sub-generation steps is that making the object 
>header bigger is a >price too high to pay for this problem. We have recently 
>gone to notable efforts to reduce one >word per object by complicating the 
>code substantially using tagged pointers, so recording the >age in the object 
>seems the opposite direction. 

I think we only need about 5 bits to store the age of each object ,  do we 
really need to increase the object back to two word per object ?

>As I mentioned before, recording per-object age will be probably a no-go (the 
>solution to this >problem will yield memory gains because objects won't suffer 
>"generational nepotism" but >making the object header bigger will annihilate 
>those gains) so that is the reason I proposed >generational sub-steps.

Am actually in favour of this Shaw’s steps scheme though it may also mean the 
more delays we want, the more steps we need to create and manage while changing 
the age threshold for objects looks simpler than creating and managing steps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

> What threshold is this?

This is the different thresholds for the generations that you can get using 
gc.get_threshold(). They are in relationship to the number of objects in every 
generation (there are slightly different rules for the latest generation).

> This method is good but we will still incur the overhead of copying objects 
> between the semi-spaces.

I am not sure what you mean with "copy" here. Moving objects from generations 
or spaces is just updating the pointers of the linked lists. Objects are not 
"copied" but "moved". 

> I have looked at a similar problem before for a different runtime and instead 
> think to consider basing promotion on the number of collections survived by 
> objects in the young generation. This requires using five bits from one of 
> two header words of each object to record its age. The objects that reach the 
> collection threshold are promoted to the old generation. This will give the 
> young objects enough time to die thereby reducing how often objects are 
> promoted to the old generation. This in turn reduces the frequency of major 
> collections hence reduced pause times.

The reason I am proposing sub-generation steps is that making the object header 
bigger is a price too high to pay for this problem. We have recently gone to 
notable efforts to reduce one word per object by complicating the code 
substantially using tagged pointers, so recording the age in the object seems 
the opposite direction. 

> I would only consider the generational semi-spaces after we have explored 
> basing promotion on the age of an object in a generation. If this doesn't 
> work, then we can use the bucket brigade scheme.

As I mentioned before, recording per-object age will be probably a no-go (the 
solution to this problem will yield memory gains because objects won't suffer 
"generational nepotism" but making the object header bigger will annihilate 
those gains) so that is the reason I proposed generational sub-steps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-28 Thread Joannah Nanjekye

Joannah Nanjekye  added the comment:

> The most common reason is that when promotions of the youngest generations 
> happen, 
>some very young objects that just arrived in the generation are >promoted 
>because we have >reached a threshold, and its death will be >delayed.

What threshold is this? Is it based on the number of objects in the young 
generation in that if the number of objects in the young generation reaches 
some value, then promotion occurs. If this is the case, we can use another 
scheme for the start before exploring steps IMHO.

Using Shaw’s Bucket Brigade Scheme  of semi-spaces or generational steps is 
good but this means a generation with n steps guarantees n scavenges before an 
object is promoted to the next generation. The surviving objects are copied 
between these pairs of semispaces b times before they are promoted to the next 
step. An n-bucket scheme guarantees that objects will be promoted to the old 
generation If they have survived between nb and nb-1 scavenges. For example, a 
2 bucket scheme, objects are copied up to three times before being promoted to 
the next generation. This method is good but we will still incur the overhead 
of copying objects between the semi-spaces.

I have looked at a similar problem before for a different runtime and instead 
think to consider basing promotion on the number of collections survived by 
objects in the young generation. This requires using five bits from one of two 
header words of each object to record its age. The objects that reach the 
collection threshold are promoted to the old generation. This will give the 
young objects enough time to die thereby reducing how often objects are 
promoted to the old generation. This in turn reduces the frequency of major 
collections hence reduced pause times.

I would only consider the generational semi-spaces after we have explored 
basing promotion on the age of an object in a generation. If this doesn't work, 
then we can use the bucket brigade scheme.

--
nosy: +nanjekyejoannah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39143] Implementing sub-generation steps in the gc

2019-12-27 Thread Pablo Galindo Salgado

New submission from Pablo Galindo Salgado :

While I was re-reading The Garbage Collection Handbook [Moss, Hosking, Jones], 
I have been doing some benchmarks of different aspects of our garbage 
collection and taking statistics using the pyperformance suite as long as 
several "production" applications that heavily creates objects. I have observed 
that our
collector presents a form of "generational nepotism" in some typical scenarios. 
The most common reason is
that when promotions of the youngest generations happen, some very young 
objects that just arrived in the generation are promoted because we have 
reached a threshold, and its death will be delayed. These objects
will likely die immediately if the promotion was delayed a bit (think of a 
burst of temporary objects being created at once) or if the promotion could 
distinguish "age" with more granularity. 

The book proposes serval solutions to this problem, and one of the simpler ones 
taking the
architecture of our collector is to use sub-generation steps: 

> Promotion can be delayed by structuring a generation into two or more aging 
> spaces. This
allows objects to be copied between the fromspace and tospace an arbitrary 
number of
times within the generation before they are promoted. In Lieberman and Hewitt's 
original
generational collector [1983], a generation is collected several times before 
all survivors
are eventually promoted en masse. In terms of the aging semispaces of Figure 
9.2b, ei­
ther all live objects in fromspace are evacuated to tospace within this 
generation or all
are promoted to the next generation, depending on the age of the generation as 
a whole.

Basically, the differences between steps and generations are that both 
segregate objects by age,
but different generations are collected at different frequencies whereas all 
the steps of a
generation are collected at the same time. By using steps in the youngest 
generation (where
most mutation occurs), and by reducing premature col­lection, the load on the 
write barrier
can be reduced while also controlling promotion, without need for per-object 
age records.

--

What do you think about implementing sub-generation steps? Maybe only on the 
youngest generation?

A "lazy" way of implementing this (although without the semantic of "steps") is 
adding more intermediate generations with the same threshold (but this likely 
won't yield the same benefits).

--
components: Interpreter Core
messages: 358916
nosy: nascheme, pablogsal, tim.peters
priority: normal
severity: normal
status: open
title: Implementing sub-generation steps in the gc
type: enhancement
versions: Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com