Heh.  I wasn't intending to be nasty, but this program makes our arena
recycling look _much_ worse than memcrunch.py does.  It cycles through
phases.  In each phase, it first creates a large randomish number of
objects, then deletes half of all objects in existence.  Except that
every 10th phase, it deletes 90% instead.  It's written to go through
100 phases, but I killed it after 10 because it was obviously going to
keep on growing without bound.

Note 1:  to do anything deterministic with obmalloc stats these days
appears to require setting the envar PYTHONHASHSEED to 0 before
running (else stats vary even by the time you get to an interactive
prompt).

Note 2:  there are 3 heavily used size classes here, for ints,
2-tuples, and class instances, of byte sizes 32, 64, and 96 on 64-bit
boxes, under my PR and under released 3.7.3.

First with my branch, after phase 10 finishes building objects:

phase 10 adding 9953410
phase 10 has 16743920 objects

# arenas allocated total           =                3,114
# arenas reclaimed                 =                    0
# arenas highwater mark            =                3,114
# arenas allocated current         =                3,114
3114 arenas * 1048576 bytes/arena  =        3,265,265,664

# bytes in allocated blocks        =        3,216,332,784

No arenas have ever been reclaimed, but space utilization is excellent
(about 98.5% of arenas are being used by objects).

Then after phase 10 deletes 90% of everything still alive:

phase 10 deleting factor 90% 15069528
phase 10 done deleting

# arenas allocated total           =                3,114
# arenas reclaimed                 =                    0
# arenas highwater mark            =                3,114
# arenas allocated current         =                3,114
3114 arenas * 1048576 bytes/arena  =        3,265,265,664

# bytes in allocated blocks        =          323,111,488

Still no arenas have been released, and space utilization is horrid.
A bit less than 10% of allocated space is being use for objects.

Now under 3.7.3.  First when phase 10 is done building:

phase 10 adding 9953410
phase 10 has 16743920 objects

# arenas allocated total           =               14,485
# arenas reclaimed                 =                2,020
# arenas highwater mark            =               12,465
# arenas allocated current         =               12,465
12465 arenas * 262144 bytes/arena  =        3,267,624,960

# bytes in allocated blocks        =        3,216,219,656

Space utilization is again excellent.  A significant number of arenas
were reclaimed - but usefully?  Let's see how things turn out after
phase 10 ends deleting 90% of the objects:

phase 10 deleting factor 90% 15069528
phase 10 done deleting

# arenas allocated total           =               14,485
# arenas reclaimed                 =                2,020
# arenas highwater mark            =               12,465
# arenas allocated current         =               12,465
12465 arenas * 262144 bytes/arena  =        3,267,624,960

# bytes in allocated blocks        =          322,998,360

Didn't manage to reclaim anything!  Space utililization is again
horrid, and it's actually consuming a bit more arena bytes than when
running under the PR.

Which is just more of what I've been seeing over & over:  3.7.3 and
the PR both do a fine job of recycling arenas, or a horrid job,
depending on the program.

For excellent recycling, change this program to use a dict instead of a set.  So

    data = {}

at the start, fill it with

    data[serial] = Stuff()

and change

    data.pop()

to use .popitem().

The difference is that set elements still appear in pseudo-random
order, but dicts are in insertion-time order.  So data.popitem() loses
the most recently added dict entry, and the program is then just
modeling stack allocation/deallocation.

def doit():
    import random
    from random import randrange
    import sys

    class Stuff:
        # add cruft so it takes 96 bytes under 3.7 and 3.8
        __slots__ = tuple("abcdefg")

        def __hash__(self):
            return hash(id(self))

    LO = 5_000_000
    HI = LO * 2
    data = set()
    serial = 0
    random.seed(42)

    for phase in range(1, 101):
        toadd = randrange(LO, HI)
        print("phase", phase, "adding", toadd)
        for _ in range(toadd):
            data.add((serial, Stuff()))
            serial += 1
        print("phase", phase, "has", len(data), "objects")
        sys._debugmallocstats()
        factor = 0.5 if phase % 10 else 0.9
        todelete = int(len(data) * factor)
        print(f"phase {phase} deleting factor {factor:.0%} {todelete}")
        for _ in range(todelete):
            data.pop()
        print("phase", phase, "done deleting")
        sys._debugmallocstats()

doit()
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZTLJGXEM7NCASL5NVGMRMDN3O4GGUEIX/

Reply via email to