Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-23 Thread Marko Rauhamaa
Gregory Ewing :

> Lawrence D’Oliveiro wrote:
>> what WOULD you consider to be so “representative”?
>
> I don't claim any of them to be representative. Different GC
> strategies have different characteristics.

My experiences with Hotspot were a bit disheartening. GC is a winning
concept provided that you don't have to strategize too much. In
practice, it seems tweaking the GC parameters is a frequent necessity.

On the other hand, I believe much of the trouble comes from storing too
much information in the heap. Applications shouldn't have semipersistent
multigigabyte lookup structures kept in RAM, at least not in numerous
small objects.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-23 Thread Gregory Ewing

Lawrence D’Oliveiro wrote:

what WOULD you consider to be so “representative”?


I don't claim any of them to be representative. Different GC
strategies have different characteristics.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-22 Thread CFK
On Jun 22, 2017 4:03 PM, "Chris Angelico"  wrote:

On Fri, Jun 23, 2017 at 5:22 AM, CFK  wrote:
> On Jun 22, 2017 9:32 AM, "Chris Angelico"  wrote:
>
> On Thu, Jun 22, 2017 at 11:24 PM, CFK  wrote:
>> When
>> I draw memory usage graphs, I see sawtooth waves to the memory usage
which
>> suggest that the garbage builds up until the GC kicks in and reaps the
>> garbage.
>
> Interesting. How do you actually measure this memory usage? Often,
> when a GC frees up memory, it's merely made available for subsequent
> allocations, rather than actually given back to the system - all it
> takes is one still-used object on a page and the whole page has to be
> retained.
>
> As such, a "create and drop" usage model would tend to result in
> memory usage going up for a while, but then remaining stable, as all
> allocations are being fulfilled from previously-released memory that's
> still owned by the process.
>
>
> I'm measuring it using a bit of a hack; I use psutil.Popen
> (https://pypi.python.org/pypi/psutil) to open a simulation as a child
> process, and in a tight loop gather the size of the resident set and the
> number of virtual pages currently in use of the child. The sawtooths are
> about 10% (and decreasing) of the size of the overall memory usage, and
are
> probably due to different stages of the simulation doing different things.
> That is an educated guess though, I don't have strong evidence to back it
> up.
>
> And, yes, what you describe is pretty close to what I'm seeing. The longer
> the simulation has been running, the smoother the memory usage gets.

Ah, I think I understand. So the code would be something like this:

Phase one:
Create a bunch of objects
Do a bunch of simulation
Destroy a bunch of objects
Simulate more
Destroy all the objects used in this phase, other than the result

Phase two:
Like phase one

In that case, yes, it's entirely possible that the end of a phase
could signal a complete cleanup of intermediate state, with the
consequent release of memory to the system. (Or, more likely, a
near-complete cleanup, with release of MOST of memory.)

Very cool bit of analysis you've done there.


Thank you! And, yes, that is essentially what is going on (or was in that
version of the simulator; I'm in the middle of a big refactor to speed
things up and expect the memory usage patterns to change)

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-22 Thread Chris Angelico
On Fri, Jun 23, 2017 at 5:22 AM, CFK  wrote:
> On Jun 22, 2017 9:32 AM, "Chris Angelico"  wrote:
>
> On Thu, Jun 22, 2017 at 11:24 PM, CFK  wrote:
>> When
>> I draw memory usage graphs, I see sawtooth waves to the memory usage which
>> suggest that the garbage builds up until the GC kicks in and reaps the
>> garbage.
>
> Interesting. How do you actually measure this memory usage? Often,
> when a GC frees up memory, it's merely made available for subsequent
> allocations, rather than actually given back to the system - all it
> takes is one still-used object on a page and the whole page has to be
> retained.
>
> As such, a "create and drop" usage model would tend to result in
> memory usage going up for a while, but then remaining stable, as all
> allocations are being fulfilled from previously-released memory that's
> still owned by the process.
>
>
> I'm measuring it using a bit of a hack; I use psutil.Popen
> (https://pypi.python.org/pypi/psutil) to open a simulation as a child
> process, and in a tight loop gather the size of the resident set and the
> number of virtual pages currently in use of the child. The sawtooths are
> about 10% (and decreasing) of the size of the overall memory usage, and are
> probably due to different stages of the simulation doing different things.
> That is an educated guess though, I don't have strong evidence to back it
> up.
>
> And, yes, what you describe is pretty close to what I'm seeing. The longer
> the simulation has been running, the smoother the memory usage gets.

Ah, I think I understand. So the code would be something like this:

Phase one:
Create a bunch of objects
Do a bunch of simulation
Destroy a bunch of objects
Simulate more
Destroy all the objects used in this phase, other than the result

Phase two:
Like phase one

In that case, yes, it's entirely possible that the end of a phase
could signal a complete cleanup of intermediate state, with the
consequent release of memory to the system. (Or, more likely, a
near-complete cleanup, with release of MOST of memory.)

Very cool bit of analysis you've done there.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-22 Thread CFK
On Jun 22, 2017 9:32 AM, "Chris Angelico"  wrote:

On Thu, Jun 22, 2017 at 11:24 PM, CFK  wrote:
> When
> I draw memory usage graphs, I see sawtooth waves to the memory usage which
> suggest that the garbage builds up until the GC kicks in and reaps the
> garbage.

Interesting. How do you actually measure this memory usage? Often,
when a GC frees up memory, it's merely made available for subsequent
allocations, rather than actually given back to the system - all it
takes is one still-used object on a page and the whole page has to be
retained.

As such, a "create and drop" usage model would tend to result in
memory usage going up for a while, but then remaining stable, as all
allocations are being fulfilled from previously-released memory that's
still owned by the process.


I'm measuring it using a bit of a hack; I use psutil.Popen (
https://pypi.python.org/pypi/psutil) to open a simulation as a child
process, and in a tight loop gather the size of the resident set and the
number of virtual pages currently in use of the child. The sawtooths are
about 10% (and decreasing) of the size of the overall memory usage, and are
probably due to different stages of the simulation doing different things.
That is an educated guess though, I don't have strong evidence to back it
up.

And, yes, what you describe is pretty close to what I'm seeing. The longer
the simulation has been running, the smoother the memory usage gets.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-22 Thread Chris Angelico
On Thu, Jun 22, 2017 at 11:24 PM, CFK  wrote:
> When
> I draw memory usage graphs, I see sawtooth waves to the memory usage which
> suggest that the garbage builds up until the GC kicks in and reaps the
> garbage.

Interesting. How do you actually measure this memory usage? Often,
when a GC frees up memory, it's merely made available for subsequent
allocations, rather than actually given back to the system - all it
takes is one still-used object on a page and the whole page has to be
retained.

As such, a "create and drop" usage model would tend to result in
memory usage going up for a while, but then remaining stable, as all
allocations are being fulfilled from previously-released memory that's
still owned by the process.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-22 Thread CFK
On Jun 22, 2017 12:38 AM, "Paul Rubin"  wrote:

Lawrence D’Oliveiro  writes:
> while “memory footprint” depends on how much memory is actually being
> retained in accessible objects.

If the object won't be re-accessed but is still retained by gc, then
refcounting won't free it either.

> Once again: The trouble with GC is, it doesn’t know when to kick in:
> it just keeps on allocating memory until it runs out.

When was the last time you encountered a problem like that in practice?
It's almost never an issue.  "Runs out" means reached an allocation
threshold that's usually much smaller than the program's memory region.
And as you say, you can always manually trigger a gc if the need arises.


I'm with Paul and Steve on this. I've had to do a **lot** of profiling on
my simulator to get it to run at a reasonable speed. Memory usage seems to
follow an exponential decay curve, hitting a strict maximum that strongly
correlates with the number of live objects in a given simulation run. When
I draw memory usage graphs, I see sawtooth waves to the memory usage which
suggest that the garbage builds up until the GC kicks in and reaps the
garbage.  In short, only an exceptionally poorly written GC would exhaust
memory before reaping garbage.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-21 Thread Paul Rubin
Lawrence D’Oliveiro  writes:
> while “memory footprint” depends on how much memory is actually being
> retained in accessible objects.

If the object won't be re-accessed but is still retained by gc, then
refcounting won't free it either.

> Once again: The trouble with GC is, it doesn’t know when to kick in:
> it just keeps on allocating memory until it runs out.

When was the last time you encountered a problem like that in practice?
It's almost never an issue.  "Runs out" means reached an allocation
threshold that's usually much smaller than the program's memory region.
And as you say, you can always manually trigger a gc if the need arises.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Progress on the Gilectomy (Posting On Python-List Prohibited)

2017-06-21 Thread Steve D'Aprano
On Thu, 22 Jun 2017 10:30 am, Lawrence D’Oliveiro wrote:

> Once again: The trouble with GC is, it doesn’t know when to kick in: it just
> keeps on allocating memory until it runs out.

Once again: no it doesn't.


Are you aware that CPython has a GC? (Or rather, a *second* GC, apart from the
reference counter.) It runs periodically to reclaim dead objects in cycles that
the reference counter won't free. It runs whenever the number of allocations
minus the number of deallocations exceed certain thresholds, and you can set
and query the thresholds using:

gc.set_threshold

gc.get_threshold


CPython alone disproves your assertion that GCs "keep on allocating memory until
it runs out". Are you aware that there are more than one garbage collection
algorithm? Apart from reference-counting GC, there are also "mark and sweep"
GCs, generational GCs (like CPython's), real-time algorithms, and more.

One real-time algorithm implicitly divides memory into two halves. When one half
is half-full, it moves all the live objects into the other half, freeing up the
first half.

The Mercury programming language even has a *compile time* garbage collector
that can determine when an object can be freed during compilation -- no sweeps
or reference counting required.

It may be that *some* (possibly toy) GC algorithms behave as you say, only
running when memory is completely full. But your belief that *all* GC
algorithms behave this way is simply wrong.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list