charlie strauss wrote:
Below is a simple program that will cause python to intermittently
stop executing for a few seconds. it's 100% reproducible on my machine.
I'd be tempted to say this is a nasty garbage collection performance
issue except that there is no major memory to be garbage collected in
this script. I'd be tempted to say it was a unix virtual memory
issue except this is occuring at around 1/5th of my physical memory
size. So something else unexplained is going on
Class Foo instances create and hold a list of size nfoo of integers.
(nfoo =10)
Class Bar instances create and hold a list of size nbar of Foo
objects. (nbar =100)
When the code runs it starts creating and storing Bar objects in a
list while watching for real-time glitches in how long it takes to
create the next Foo object. If this is longer than 1/2 of a second
then it reports it.
On my computer after creating 1500 Bar objects, the rate of
creation of new Foo suddenly has a periodic glitch. This glitch re-
occurs about every 300 Bar Objects, and the duration of the glitch
keeps getting longer--growing to many seconds!!!!
Platform: 800Mhz powermac g 4 1Gb of memory
python: python 2.4.2
Note: since I an using absolute time threshold for reporting the
glitches, the first one may take more iterations before it occurs on
a fast computer. You may need to increase nbar or nfoo.
import sys
from time import time
class Foo(object):
def __init__(me,nfoo):
me.memory = [1]*nfoo
class Bar(object):
def __init__(me,nbar,nfoo):
tgx.set_tag("Bar") # timer
b = [None]*nbar
for i in xrange(nbar):
b[i]=Foo(nfoo) # make a foo, add it to list
tgx.jump("Bar"+`i`) #timer
me.b = b #store the list in my instance memory
# just a utility class to time things.
class gtime:
def __init__(me,f=sys.stderr):
me.last = time()
me.f=f
me.tag = ""
me.ticks = 0
def set_tag(me,tag):
me.tag = tag
me.ticks = 0
def reset(me):
me.ticks = 0
def jump(me,tag="NONE",allowed_gap=0.5):
me.t = time()
me.ticks +=1
if me.t-me.last>allowed_gap:
print >>me.f,"Big Gap:",me.t-me.last,"seconds
",me.tag,tag,me.ticks
me.last = time()
tgx = gtime() # instance of the timer
# main loop
nbar = 100
nfoo = 10
ad_nauseum = 20000
final = [None]*ad_nauseum
for i in xrange(ad_nauseum ):
if i%100 == 0 :print >>sys.stderr,"Bars Made: ",i
final[i] = Bar(nbar,nfoo)
I'm not the Timbot, but the behaviour of the program seems reasonably
comprehensible to me. You create a list of 20,000 elements and proceed
to populate it with Bar objects, each of which contains a
thousand-element list pointing to a ten-element list.
So despite the fact that all list elements are (references to) the
integer 1, there are a lot of references, which take up a lot of memory.
Each time you report on the number of Bars you've created 100,000
references to 1.
So it's hardly surprising that at some point the garbage collector cuts
in and starts looking for free memory, which takes a certain amount of time.
It's also unsurprising that the garbage collector takes longer the more
stuff you have in memory, because it has to sweep all those structures
to make sure that there aren't any collectible cycles.
As I run the program I see virtual memory usage climbing steadily -
hardly surprising given the nature of the program. At some point the
working set of the program is going to exceed the amount of physical
memory that the operating system can allocate it.
At that point paging will start to occur, and garbage collection (which
has to access more or less all of memory) is now triggering disk
activity, making it slower by a significant margin. This didn't happen
on my machine with your initial settings as I have plenty of memory, but
when I increased the foo size to 100, by the time I'd allocated 13,000
bars I was seeing big big gaps and little big gaps, the big ones
apparently being required by the need to page something else out to
allocate additional space for the task:
Bars Made: 12900
Big Gap: 3.78899979591 seconds Bar Bar95 96
Bars Made: 13000
Big Gap: 0.500999927521 seconds Bar Bar89 90
Bars Made: 13100
Bars Made: 13200
Big Gap: 3.72300004959 seconds Bar Bar65 66
Bars Made: 13300
Big Gap: 0.500999927521 seconds Bar Bar72 73
Bars Made: 13400
Bars Made: 13500
Big Gap: 3.83699989319 seconds Bar Bar35 36
Bars Made: 13600
Bars Made: 13700
Big Gap: 0.500999927521 seconds Bar Bar65 66
Bars Made: 13800
Big Gap: 3.90399980545 seconds Bar Bar4 5
Bars Made: 13900
Bars Made: 14000
Bars Made: 14100
Big Gap: 3.95200014114 seconds Bar Bar75 76
Bars Made: 14200
Big Gap: 0.500999927521 seconds Bar Bar38 39
Bars Made: 14300
I trust readers will excuse me for attaching a small PNG graphic that
clearly shows this behaviour: CPU utilisation climbs to around 100%, but
then starts dropping intermittently - this *has* to be paging activity.
When CPU activity dives just after 300 seconds that's when I hit CTRL/C.
The subsequent 800 seconds was spent unwinding the memory so it could be
deallocated!
Bars Made: 14400
Big Gap: 0.799000024796 seconds Bar Bar0 1
Traceback (most recent call last):
File "test40.py", line 57, in <module>
final[i] = Bar(nbar,nfoo)
File "test40.py", line 15, in __init__
b[i]=Foo(nfoo) # make a foo, add it to list
File "test40.py", line 6, in __init__
def __init__(me,nfoo):
KeyboardInterrupt
[LOOOOOOOOONG PAUSE ...}
[EMAIL PROTECTED] ~/Projects/Python
$
I might add that it took a long time for Windows performance to recover,
since any task that I subsequently touched had to be brought back in
from suspension in the paging file.
This is one of those cases where it would be faster to avoid using
garbage collection at all, since no cyclic structures appear to be
created. But there's an indication of just how much memory you are using
in the length of time the program takes between reporting a
KeyboardInterrupt and returning to the command line prompt: this is
precisely the length of time it take to deallocate all those lists and
return them to the free memory pool.
So basically what you have here is a pathological example of why it's
sometimes wise to disable garbage collection. Tim, did I miss anything?
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
--
http://mail.python.org/mailman/listinfo/python-list