Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-09-04 Thread Amaury Forgeot d'Arc
Hello,

Andrey Zhmoginov wrote:
> I don't know if the following question is relevant, but it seems that many
> people here are familiar with Python cyclic garbage collector.
> I see Python [v2.5.2 (r252:60911, Jul 31 2008, 17:28:52)] crashing with
> Segment fault when I extend Python with a very simple module. This behavior
> is observed when I create a thousand of lists (it does not crash with
> 10-100) in the module with the garbage collector turned on. When I turn it
> off - everything is perfect. I suspect that it is my module, but if it is a
> Python bug ( _GC_Malloc? memory leak somewhere?), it may be worth reporting.
>
> The gdb "where" reply is the following:
>
>   #0  0x080d8de9 in PyErr_Occurred ()
>   #1  0x080f508f in _PyObject_GC_Malloc ()
>   #2  0x080f5155 in _PyObject_GC_New ()
>   #3  0x08079c98 in PyList_New ()
>   #4  0xb7f53519 in draw_d_simple () from ./rt/rt_core.so
>   #5  0xb7cf7833 in ffi_call_SYSV () from
> /usr/lib/python2.5/lib-dynload/_ctypes.so
>   #6  0xb7cf766a in ffi_call () from
> /usr/lib/python2.5/lib-dynload/_ctypes.so
>   #7  0xb7cf2534 in _CallProc () from
> /usr/lib/python2.5/lib-dynload/_ctypes.so
>   #8  0xb7cec02a in ?? () from /usr/lib/python2.5/lib-dynload/_ctypes.so
>   #9  0x0805cb97 in PyObject_Call ()
>   #10 0x080c7aa7 in PyEval_EvalFrameEx ()
>   #11 0x080c96e5 in PyEval_EvalFrameEx ()
>   #12 0x080cb1f7 in PyEval_EvalCodeEx ()
>   #13 0x080cb347 in PyEval_EvalCode ()
>   #14 0x080ea818 in PyRun_FileExFlags ()
>   #15 0x080eaab9 in PyRun_SimpleFileExFlags ()
>   #16 0x08059335 in Py_Main ()
>   #17 0x080587f2 in main ()
>
> The crashing python code is:
>
>   from ctypes import *
>   import gc
>   core = CDLL( "./tst.so" )
>   core.a.argtypes = []
>   core.a.restype = py_object
>   #gc.disable()
>   core.a()
>   #gc.enable()
>
> And tst.cpp is:
>
>   #include 
>   extern "C" { PyObject *a(); }
>   PyObject *a()
>   {
>   int n = 1000;
>   PyObject *item;
>   for ( int i = 0; i < n; i++ ) item = PyList_New( 0 );  // Crashes
> here (somewhere in between).
>   return item;
>   }
>
> tst.cpp is compiled with:
>
>   g++ -shared -Wl,-soname,tst.tmp.so -o tst.so tst.o
>   g++ -I /usr/include/python2.5 -Wall -fPIC -o tst.o -c tst.cpp

This has nothing to do with the garbage collection, but with the GIL
(the famous Global Interpreter Lock http://docs.python.org/api/threads.html )

When ctypes calls a CDLL function, it releases the GIL (to let
eventual other threads run)
Your example crashes because it calls python API functions when the
GIL is not held,
and is so invalid.

There are two solutions:
- re-acquire the GIL in your C functions with PyGILState_Ensure() & co
- use ctypes.PyDLL( "./tst.so" ), which does not release the GIL.

Hope this helps,

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-09-03 Thread Andrey Zhmoginov


Would anyone mind if I did add a public C API for gc.disable() and
gc.enable()? I would like to use it as an optimization for the pickle
module (I found out that I get a good 2x speedup just by disabling the
GC while loading large pickles). Of course, I could simply import the
gc module and call the functions there, but that seems overkill to me.
I included the patch below for review.
  
I don't know if the following question is relevant, but it seems that 
many people here are familiar with Python cyclic garbage collector.
I see Python [v2.5.2 (r252:60911, Jul 31 2008, 17:28:52)] crashing with 
Segment fault when I extend Python with a very simple module. This 
behavior is observed when I create a thousand of lists (it does not 
crash with 10-100) in the module with the garbage collector turned on. 
When I turn it off - everything is perfect. I suspect that it is my 
module, but if it is a Python bug ( _GC_Malloc? memory leak somewhere?), 
it may be worth reporting.


The gdb "where" reply is the following:

   #0  0x080d8de9 in PyErr_Occurred ()
   #1  0x080f508f in _PyObject_GC_Malloc ()
   #2  0x080f5155 in _PyObject_GC_New ()
   #3  0x08079c98 in PyList_New ()
   #4  0xb7f53519 in draw_d_simple () from ./rt/rt_core.so
   #5  0xb7cf7833 in ffi_call_SYSV () from 
/usr/lib/python2.5/lib-dynload/_ctypes.so
   #6  0xb7cf766a in ffi_call () from 
/usr/lib/python2.5/lib-dynload/_ctypes.so
   #7  0xb7cf2534 in _CallProc () from 
/usr/lib/python2.5/lib-dynload/_ctypes.so

   #8  0xb7cec02a in ?? () from /usr/lib/python2.5/lib-dynload/_ctypes.so
   #9  0x0805cb97 in PyObject_Call ()
   #10 0x080c7aa7 in PyEval_EvalFrameEx ()
   #11 0x080c96e5 in PyEval_EvalFrameEx ()
   #12 0x080cb1f7 in PyEval_EvalCodeEx ()
   #13 0x080cb347 in PyEval_EvalCode ()
   #14 0x080ea818 in PyRun_FileExFlags ()
   #15 0x080eaab9 in PyRun_SimpleFileExFlags ()
   #16 0x08059335 in Py_Main ()
   #17 0x080587f2 in main ()

The crashing python code is:

   from ctypes import *
   import gc
   core = CDLL( "./tst.so" )
   core.a.argtypes = []
   core.a.restype = py_object
   #gc.disable()
   core.a()
   #gc.enable()

And tst.cpp is:

   #include 
   extern "C" { PyObject *a(); }
   PyObject *a()
   {
   int n = 1000;
   PyObject *item;
   for ( int i = 0; i < n; i++ ) item = PyList_New( 0 );  // 
Crashes here (somewhere in between).

   return item;
   }

tst.cpp is compiled with:

   g++ -shared -Wl,-soname,tst.tmp.so -o tst.so tst.o
   g++ -I /usr/include/python2.5 -Wall -fPIC -o tst.o -c tst.cpp

Thanks for your help!

- Andrew.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-26 Thread Greg Ewing

Jeff Hall wrote:
I mistakenly thought that was 
because they were assumed to be small. It sounds like they're ignored 
because they're automatically collected and so they SHOULD be ignored 
for object garbage collection.


Strings aren't tracked by the cyclic garbage collector
because they don't contain object references and therefore
can't form part of a cycle.

However, unless I'm mistaken, allocations and deallocations
of them are still counted for the purpose of determining
when to perform a cyclic GC pass. So if you allocate lots
of strings and they aren't getting deallocated, a cyclic
GC pass will eventually occur, in case the strings are
being referenced from a cycle that can be cleaned up.

I don't know whether/how re uses string objects internally
while it's matching, so I can't say what its garbage
collection characteristics might be when matching against
a huge string.

The behaviour you observed might have been due to the
nature of the re being matched -- some res can have
quadratic or exponential behaviour all by themselves.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Alexandre Vassalotti
On Thu, Jun 26, 2008 at 12:01 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>> I would it be possible, if not a good idea, to only track object
>> deallocations as the GC traversal trigger? As far as I know, dangling
>> cyclic references cannot be formed when allocating objects.
>
> Not sure what you mean by that.
>
> x = []
> x.append(x)
> del x
>
> creates a cycle with no deallocation occurring.
>

Oh... never mind then.

-- Alexandre
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Martin v. Löwis
> I would it be possible, if not a good idea, to only track object
> deallocations as the GC traversal trigger? As far as I know, dangling
> cyclic references cannot be formed when allocating objects.

Not sure what you mean by that.

x = []
x.append(x)
del x

creates a cycle with no deallocation occurring.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Alexandre Vassalotti
On Wed, Jun 25, 2008 at 4:55 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> I think exactly the other way 'round. The timing of thing should not
> matter at all, only the exact sequence of allocations and deallocations.

I would it be possible, if not a good idea, to only track object
deallocations as the GC traversal trigger? As far as I know, dangling
cyclic references cannot be formed when allocating objects. So, this
could potentially mitigate the quadratic behavior during allocation
bursts.

-- Alexandre
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Martin v. Löwis
> I took the statement, "Current GC only takes into account container
> objects, which, most significantly, ignores string objects (of which
> most applications create plenty)" to mean that strings were ignored for
> deciding when to do garbage collection. I mistakenly thought that was
> because they were assumed to be small. It sounds like they're ignored
> because they're automatically collected and so they SHOULD be ignored
> for object garbage collection. 

More precisely, a string object can never participate in a cycle (it
can be referenced from a cycle, but not be *in* the cycle, as it
has no references to other objects). GC in Python is only about
container objects (which potentially can be cyclic); non-container
objects are released when the refcount says they are no longer
referenced.

Whether or not allocation of definitely-non-cyclic objects should
still trigger cyclic GC (in the hope that some objects hang on a
garbage cycle) is a question that is open to debate; I'd prefer an
analysis of existing applications before making decisions.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Jeff Hall
On Wed, Jun 25, 2008 at 4:55 PM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:

> > It seems to me that the root problem is allocation spikes of legitimate,
> > useful data. Perhaps then we need some sort of "test" to determine if
> > those are legitimate. Perhaps checking every nth (with n decreasing as
> > allocation bytes increases) object allocated during a "spike" could be
> > useful. Then delay garbage collection until x consecutive objects are
> > found to be garbage?
> >
> > It seems like we should be attacking the root cause rather than finding
> > some convoluted math that attempts to work for all scenarios.
>
> I think exactly the other way 'round. The timing of thing should not
> matter at all, only the exact sequence of allocations and deallocations.
>
> I trust provable maths much more than I trust ad-hoc heuristics, even
> if you think the math is convoluted.
>

I probably chose my wording poorly (particularly for a newcomer/outsider).
What I meant was that the numbers used in GC currently appear arbitrary. The
idea of three "groups" (youngest, oldest and middle) is also arbitrary.
Would it not be better to tear that system apart and create a "sliding"
scale. If the timing method is undesirable then make it slide based on the
allocation/deallocation difference. In this way, the current breakpoints and
number of groups (all of which are arbitrary and fixed) could be replaced by
one coefficient (and yes, I recognize that it would also be arbitrary but it
would be one, tweakable number rather than several).

My gut tells me that your current fix is going to work just fine for now but
we're going to end up tweaking it (or at least discussing tweaking it) every
6-12 months.


> > On a side note, the information about not GCing on string objects is
> > interesting? Is there a way to override this behavior?
>
> I think you misunderstand. Python releases unused string objects just
> fine, and automatically. It doesn't even need GC for that.
>

I took the statement, "Current GC only takes into account container objects,
which, most significantly, ignores string objects (of which most
applications create plenty)" to mean that strings were ignored for deciding
when to do garbage collection. I mistakenly thought that was because they
were assumed to be small. It sounds like they're ignored because they're
automatically collected and so they SHOULD be ignored for object garbage
collection. Thanks for the clarification... Back to the drawing board on my
other problem ;)

-- 
Haikus are easy
Most make very little sense
Refrigerator
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Martin v. Löwis
> It seems to me that the root problem is allocation spikes of legitimate,
> useful data. Perhaps then we need some sort of "test" to determine if
> those are legitimate. Perhaps checking every nth (with n decreasing as
> allocation bytes increases) object allocated during a "spike" could be
> useful. Then delay garbage collection until x consecutive objects are
> found to be garbage?
> 
> It seems like we should be attacking the root cause rather than finding
> some convoluted math that attempts to work for all scenarios.

I think exactly the other way 'round. The timing of thing should not
matter at all, only the exact sequence of allocations and deallocations.

I trust provable maths much more than I trust ad-hoc heuristics, even
if you think the math is convoluted.

> On a side note, the information about not GCing on string objects is
> interesting? Is there a way to override this behavior?

I think you misunderstand. Python releases unused string objects just
fine, and automatically. It doesn't even need GC for that.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-25 Thread Jeff Hall
It seems to me that the root problem is allocation spikes of legitimate,
useful data. Perhaps then we need some sort of "test" to determine if those
are legitimate. Perhaps checking every nth (with n decreasing as allocation
bytes increases) object allocated during a "spike" could be useful. Then
delay garbage collection until x consecutive objects are found to be
garbage?

It seems like we should be attacking the root cause rather than finding some
convoluted math that attempts to work for all scenarios.

On a side note, the information about not GCing on string objects is
interesting? Is there a way to override this behavior? I've found that re.py
chokes on large text files (4MB plus) without line ends (don't ask, they're
not our files but we have to parse them). I wonder if this isn't the
reason...

If the answer to that is, "no, strings are always ignored" I'd recommend
rethinking this (or providing an option to override somehow.

-- 
Haikus are easy
Most make very little sense
Refrigerator
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-24 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> 
> I'd like to see in an experiment whether this is really true.

Right, all those ideas should be implemented and tried out. I don't really have 
time to spend on it right now.

Also, what's missing is a suite of performance/efficiency tests for the garbage
collector.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-23 Thread Martin v. Löwis
> It would not help the quadratic behaviour - and is orthogonal to your 
> proposal -
> , but at least avoid calling the GC too often when lots of small objects are 
> allocated (as opposed to lots of large objects).

I'd like to see in an experiment whether this is really true. Current GC
only takes into account container objects, which, most significantly,
ignores string objects (of which most applications create plenty). If
you make actual memory consumption a trigger, the string objects would
count towards triggering GC. That may or may not be the better approach,
anyway - but it's not certain (for me) that such a scheme would cause GC
to be triggered less often, in a typical application.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-23 Thread Antoine Pitrou
Martin v. Löwis  v.loewis.de> writes:
> Currently, only youngest collections are triggered by allocation
> rate; middle and old are triggered by frequency of youngest collection.
> So would you now specify that the youngest collection should occur
> if-and-only-if a new arena is allocated? Or discount arenas returned
> from arenas allocated?

The latter sounds reasonable. IIRC an arena is 256KB, which is less than an 
entry level L2 cache. Therefore waiting for an arena to be filled shouldn't 
deteriorate cache locality a lot.

To avoid situations where the GC is never called we could combine that with an 
allocation counter, but with a much higher threshold than currently.

> Or apply this to triggering other generation
> collections but youngest? How would that help the quadratic behavior
> (which really needs to apply a factor somewhere)?

It would not help the quadratic behaviour - and is orthogonal to your proposal -
, but at least avoid calling the GC too often when lots of small objects are 
allocated (as opposed to lots of large objects).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-22 Thread Martin v. Löwis
> pymalloc needing to allocate a new arena would be a different way to
> track an excess of allocations over deallocations, and in some ways
> more sensible (since it would reflect an excess of /bytes/ allocated
> over bytes freed, rather than an excess in the counts of objects
> allocated-over-freed regardless of their sizes -- an implication is,
> e.g., that cyclic gc would be triggered much less frequently by mass
> creation of small tuples than of small dicts, since a small tuple
> consumes much less memory than a small dict).
> 
> Etc. ;-)

:-) So my question still is: how exactly?

Currently, only youngest collections are triggered by allocation
rate; middle and old are triggered by frequency of youngest collection.
So would you now specify that the youngest collection should occur
if-and-only-if a new arena is allocated? Or discount arenas returned
from arenas allocated? Or apply this to triggering other generation
collections but youngest? How would that help the quadratic behavior
(which really needs to apply a factor somewhere)?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-22 Thread Tim Peters
[Antoine Pitrou]
>> Would it be helpful if the GC was informed of memory growth by the
>> Python memory allocator (that is, each time it either asks or gives back
>> a block of memory to the system allocator) ?

[Martin v. Löwis]
> I don't see how. The garbage collector is already informed about memory
> growth; it learns exactly when a container object is allocated or
> deallocated. That the allocator then requests memory from the system
> only confirms what the garbage collector already knew: that there are
> lots of allocated objects. From that, one could infer that it might
> be time to perform garbage collection - or one could infer that all
> the objects are really useful, and no garbage can be collected.

Really the same conundrum we currently face:  cyclic gc is currently
triggered by reaching a certain /excess/ of allocations over
deallocations.  From that we /do/ infer it's time to perform garbage
collection -- but, as some examples here showed, it's sometimes really
the case that the true meaning of the excess is that "all the objects
are really useful, and no garbage can be collected -- and I'm creating
a lot of them".

pymalloc needing to allocate a new arena would be a different way to
track an excess of allocations over deallocations, and in some ways
more sensible (since it would reflect an excess of /bytes/ allocated
over bytes freed, rather than an excess in the counts of objects
allocated-over-freed regardless of their sizes -- an implication is,
e.g., that cyclic gc would be triggered much less frequently by mass
creation of small tuples than of small dicts, since a small tuple
consumes much less memory than a small dict).

Etc. ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:
 > > XEmacs implements this strategy in a way which is claimed to give
 > > constant amortized time (ie, averaged over memory allocated).
 > 
 > See my recent proposal.

I did, crossed in the mail.  To the extent that I understand both
systems, your proposal looks like an improvement over what we've got.

 > > However, isn't the real question whether there is memory pressure or
 > > not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
 > > spike might have no observable consequences.  How about a policy of
 > > GC-ing with decreasing period ("time" measured by bytes allocated or
 > > number of allocations) as the fraction of memory used increases,
 > > starting from a pretty large fraction (say 50% by default)?
 > 
 > The problem with such an approach is that it is very difficult to
 > measure.

On some systems it should be possible to get information about how
much paging is taking place, which would be an indicator of pressure.

 > What to do about virtual memory?

If you're in virtual memory, you're already in trouble.  Once you're
paging, you need to increase time between GCs, so the rule would not
be monotonic.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Nick Coghlan

Martin v. Löwis wrote:

Antoine Pitrou wrote:
Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 

I don't think any strategies based on timing will be successful.
Instead, one should count and analyze objects (although I'm unsure
how exactly that could work).

Would it be helpful if the GC was informed of memory growth by the
Python memory allocator (that is, each time it either asks or gives back
a block of memory to the system allocator) ?


I don't see how. The garbage collector is already informed about memory
growth; it learns exactly when a container object is allocated or
deallocated. That the allocator then requests memory from the system
only confirms what the garbage collector already knew: that there are
lots of allocated objects. From that, one could infer that it might
be time to perform garbage collection - or one could infer that all
the objects are really useful, and no garbage can be collected.


I was wondering whether it might be useful to detect the end of an 
allocation spike: if PyMalloc incremented an internal counter each time 
it requested more memory from the system, then the GC could check 
whether that number had changed on each collection cycle. If the GC goes 
a certain number of collection cycles without more memory being 
requested from the system, then it can assume the allocation spike is 
over and tighten up the default tuning again.


Such a pymalloc counter would provide a slightly more holistic view of 
overall memory usage changes than the container-focused view of the 
cyclic GC.


Cheers,
Nick.

--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> XEmacs implements this strategy in a way which is claimed to give
> constant amortized time (ie, averaged over memory allocated).

See my recent proposal. The old trick is to do reorganizations
in a fixed fraction of the total size, resulting in a per-increase
amortized-constant overhead (assuming each reorganization takes time
linear with total size).

> However, isn't the real question whether there is memory pressure or
> not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
> spike might have no observable consequences.  How about a policy of
> GC-ing with decreasing period ("time" measured by bytes allocated or
> number of allocations) as the fraction of memory used increases,
> starting from a pretty large fraction (say 50% by default)?

The problem with such an approach is that it is very difficult to
measure. What to do about virtual memory? What to do about other
applications that also consume memory?

On some systems (Windows in particular), the operating system indicates
memory pressure through some IPC mechanism; on such systems, it might
be reasonable to perform garbage collection (only?) when the system asks
for it. However, the system might not ask for GC while the swap space
is still not exhausted, meaning that the deferred GC would take a long
time to complete (having to page in every object).

> Nevertheless, I think the real solution has to be for Python
> programmers to be aware that there is GC, and that they can tune it.

I don't think there is a "real solution". I think programmers should
abstain from complaining if they can do something about the problem
in their own application (unless the complaint is formulated as a
patch) - wait - I think programmers should abstain from complaining,
period.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:

 > Given the choice of "run slower" and "run out of memory", Python should
 > always prefer the former.
 > 
 > One approach could be to measure how successful a GC run was: if GC
 > finds that more-and-more objects get allocated and very few (or none)
 > are garbage, it might conclude that this is an allocation spike, and
 > back off. The tricky question is how to find out that the spike is
 > over.

XEmacs implements this strategy in a way which is claimed to give
constant amortized time (ie, averaged over memory allocated).  I
forget the exact parameters, but ISTR it's just period ("time"
measured by bytes allocated) is proportional to currently allocated
memory.  Some people claim this is much more comfortable than the
traditional "GC after N bytes are allocated" algorithm but I don't
notice much difference.  I don't know how well this intuition carries
over to noninteractive applications.

In XEmacs experimenting with such strategies is pretty easy, since the
function that determines period is only a few lines long.  I assume
that would be true of Python, too.

However, isn't the real question whether there is memory pressure or
not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
spike might have no observable consequences.  How about a policy of
GC-ing with decreasing period ("time" measured by bytes allocated or
number of allocations) as the fraction of memory used increases,
starting from a pretty large fraction (say 50% by default)?  The total
amount of memory could be a soft limit, defaulting to the amount of
fast memory actually available.

For interactive and maybe some batch applications, it might be
appropriate to generate a runtime warning as memory use approches some
limits, too.

Nevertheless, I think the real solution has to be for Python
programmers to be aware that there is GC, and that they can tune it.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
Antoine Pitrou wrote:
> Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 
>> I don't think any strategies based on timing will be successful.
>> Instead, one should count and analyze objects (although I'm unsure
>> how exactly that could work).
> 
> Would it be helpful if the GC was informed of memory growth by the
> Python memory allocator (that is, each time it either asks or gives back
> a block of memory to the system allocator) ?

I don't see how. The garbage collector is already informed about memory
growth; it learns exactly when a container object is allocated or
deallocated. That the allocator then requests memory from the system
only confirms what the garbage collector already knew: that there are
lots of allocated objects. From that, one could infer that it might
be time to perform garbage collection - or one could infer that all
the objects are really useful, and no garbage can be collected.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Antoine Pitrou

Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 
> I don't think any strategies based on timing will be successful.
> Instead, one should count and analyze objects (although I'm unsure
> how exactly that could work).

Would it be helpful if the GC was informed of memory growth by the
Python memory allocator (that is, each time it either asks or gives back
a block of memory to the system allocator) ?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Terry Reedy



Kevin Jacobs <[EMAIL PROTECTED]> wrote:

I can say with complete certainty that of the 20+ programmers I've had 
working for me, many who have used Python for 3+ years, not a single one 
would think to question the garbage collector if they observed the kind 
of quadratic time complexity I've demonstrated.  This is not because 
they are stupid, but because they have only a vague idea that Python 
even has a garbage collector, never mind that it could be behaving badly 
for such innocuous looking code.


As I understand it, gc is needed now more that ever because new style 
classes make reference cycles more common.  On the other hand, greatly 
increased RAM size (from some years ago) makes megaobject bursts 
possible.  Such large bursts move the hidden quadratic do-nothing drag 
out of the relatively flat part of the curve (total time just double or 
triple what it should be) to where it can really bite.  Leaving aside 
what you do for your local group, can we better warn Python programmers 
now, for the upcoming 2.5, 2.6, and 3.0 releases?


Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says:
"Objects are never explicitly destroyed; however, when they become 
unreachable they may be garbage-collected. An implementation is allowed 
to postpone garbage collection or omit it altogether — it is a matter of 
implementation quality how garbage collection is implemented, as long as 
no objects are collected that are still reachable. (Implementation note: 
the current implementation uses a reference-counting scheme with 
(optional) delayed detection of cyclically linked garbage, which 
collects most objects as soon as they become unreachable, but is not 
guaranteed to collect garbage containing circular references. See the 
documentation of the gc module for information on controlling the 
collection of cyclic garbage.)"

I am not sure what to add here, (especially for those who do not read it;-).

The Library Manual gc section says "Since the collector supplements the 
reference counting already used in Python, you can disable the collector 
if you are sure your program does not create reference cycles."  Perhaps 
 it should also say "You should disable when creating millions of 
objects without cycles".


The installed documentation set (on Windows, at least) include some 
Python HOWTOs.  If one were added on Space Management (implementations, 
problems, and solutions), would your developers read it?


Maybe we should consider more carefully before declaring the status quo 
sufficient.  Average developers do allocate millions of objects in 
bursts and super-linear time complexity for such operations is not 
acceptable.  Thankfully I am around to help my programmers work around 
such issues or else they'd be pushing to switch to Java, Ruby, C#, or 
whatever since Python was inexplicably "too slow" for "real work".  This 
being open source, I'm certainly willing to help in the effort to do so, 
but not if potential solutions will be ruled out as being unnecessary.


To me, 'sufficient' (time-dependent) and 'necessary' are either too 
vague or  too strict to being about what you want -- change.  This is 
the third thread I have read (here + c.l.p) on default-mode gc  problems 
(but all in the last couple of years or so).  So, especially with the 
nice table someone posted recently, on time with and without gc, and 
considering that installed RAM continues to grow, I am persuaded that 
default behavior improvement that does not negatively impact the vast 
majority would be desirable.


Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Bill Janssen
> > What follows from that? To me, the natural conclusion is "people who
> > witness performance problems just need to despair, or accept them, as
> > they can't do anything about it", however, I don't think this is the
> > conclusion that you had in mind.
> >
> 
> I can say with complete certainty that of the 20+ programmers I've had
> working for me, many who have used Python for 3+ years, not a single one
> would think to question the garbage collector if they observed the kind of
> quadratic time complexity I've demonstrated.  This is not because they are
> stupid, but because they have only a vague idea that Python even has a
> garbage collector, never mind that it could be behaving badly for such
> innocuous looking code.

Perhaps this is something documentation could help.  I'm thinking of a
one-page checklist listing places they might look for performance
problems, that your programmers could work through.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I'm not sure I agree with this.  GC IIRC was introduced primarily to
> alleviate *long-term* memory starvation.

I don't think that's historically the case. GC would not need to be
generational if releasing short-lived objects shortly after they become
garbage was irrelevant. Of course, it was always expected that much
memory is released through mere reference counting, and that GC only
kicks in "after some time". However "after some time" was changed from
"after 5000 allocations" to "after 700 allocations" in


r17274 | jhylton | 2000-09-05 17:44:50 +0200 (Di, 05 Sep 2000) | 2 lines
Geänderte Pfade:
   M /python/trunk/Modules/gcmodule.c

compromise value for threshold0: not too high, not too low



Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Aahz
On Sat, Jun 21, 2008, "Martin v. L??wis" wrote:
> 
> In general, any solution of the "do GC less often" needs to deal with
> cases where lots of garbage gets produced in a short amount of time
> (e.g. in a tight loop), and which run out of memory when GC is done less
> often.
> 
> Given the choice of "run slower" and "run out of memory", Python should
> always prefer the former.

I'm not sure I agree with this.  GC IIRC was introduced primarily to
alleviate *long-term* memory starvation.  You are now IMO adding a new
goal for GC that has not been previously articulated.  I believe this
requires consensus rather than a simple declaration of principle.

Guido's opinion if he has one obviously overrules.  ;-)  Guido?
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> Idea 1: Allow GC to run automatically no more often than n CPU seconds,
> n being perhaps 5 or 10.

I think it's very easy to exhaust the memory with such a policy, even
though much memory would still be available. Worse, in a program
producing a lot of garbage, performance will go significantly down as
the python starts thrashing the swap space.

> Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is
> the time taken by the last GC round.

How would that take the incremental GC into account? (i.e. what is
"the time taken by the last GC round"?)

Furthermore, the GC run time might well be under the resolution of the
CPU seconds clock.

> These limits could be reset or scaled by the GC collecting more than n%
> of the generation 0 objects or maybe the number of PyMalloc arenas
> increasing by a certain amount?

I don't think any strategies based on timing will be successful.
Instead, one should count and analyze objects (although I'm unsure
how exactly that could work).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Kevin Jacobs <[EMAIL PROTECTED]>
On Sat, Jun 21, 2008 at 11:20 AM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:

> In general, any solution of the "do GC less often" needs to deal with
> cases where lots of garbage gets produced in a short amount of time
> (e.g. in a tight loop), and which run out of memory when GC is done less
> often.
>


Idea 1: Allow GC to run automatically no more often than n CPU seconds, n
being perhaps 5 or 10.
Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is the
time taken by the last GC round.

These limits could be reset or scaled by the GC collecting more than n% of
the generation 0 objects or maybe the number of PyMalloc arenas increasing
by a certain amount?

-Kevin


> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> Well, they could hang themselves or switch to another language (which
> some people might view as equivalent :-)), but perhaps optimistically
> the various propositions that were sketched out in this thread (by Adam
> Olsen and Greg Ewing) could bring an improvement. I don't know how
> realistic they are, perhaps an expert would have an answer.

In general, any solution of the "do GC less often" needs to deal with
cases where lots of garbage gets produced in a short amount of time
(e.g. in a tight loop), and which run out of memory when GC is done less
often.

Given the choice of "run slower" and "run out of memory", Python should
always prefer the former.

One approach could be to measure how successful a GC run was: if GC
finds that more-and-more objects get allocated and very few (or none)
are garbage, it might conclude that this is an allocation spike, and
back off. The tricky question is how to find out that the spike is
over.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I can say with complete certainty that of the 20+ programmers I've had
> working for me, many who have used Python for 3+ years, not a single one
> would think to question the garbage collector if they observed the kind
> of quadratic time complexity I've demonstrated.  This is not because
> they are stupid, but because they have only a vague idea that Python
> even has a garbage collector, never mind that it could be behaving badly
> for such innocuous looking code.
> 
> Maybe we should consider more carefully before declaring the status quo
> sufficient.

This was precisely my question: What follows from the above observation?

I personally didn't declare the status quo sufficient - I merely
declared it as being the status quo.

> Average developers do allocate millions of objects in
> bursts and super-linear time complexity for such operations is not
> acceptable.  Thankfully I am around to help my programmers work around
> such issues or else they'd be pushing to switch to Java, Ruby, C#, or
> whatever since Python was inexplicably "too slow" for "real work".  This
> being open source, I'm certainly willing to help in the effort to do so,
> but not if potential solutions will be ruled out as being unnecessary.

I wouldn't rule out solutions as being unnecessary. I might rule out
solutions that negatively impact existing software, for the sake of
improving other existing software.

Unfortunately, the only way to find out whether a solution will be ruled
out is to propose it first. Only then you'll see what response you get.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Kevin Jacobs <[EMAIL PROTECTED]>
On Sat, Jun 21, 2008 at 4:33 AM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:

> > I don't think expecting people to tweak gc parameters when they witness
> > performance problems is reasonable.
>
> What follows from that? To me, the natural conclusion is "people who
> witness performance problems just need to despair, or accept them, as
> they can't do anything about it", however, I don't think this is the
> conclusion that you had in mind.
>

I can say with complete certainty that of the 20+ programmers I've had
working for me, many who have used Python for 3+ years, not a single one
would think to question the garbage collector if they observed the kind of
quadratic time complexity I've demonstrated.  This is not because they are
stupid, but because they have only a vague idea that Python even has a
garbage collector, never mind that it could be behaving badly for such
innocuous looking code.

Maybe we should consider more carefully before declaring the status quo
sufficient.  Average developers do allocate millions of objects in bursts
and super-linear time complexity for such operations is not acceptable.
Thankfully I am around to help my programmers work around such issues or
else they'd be pushing to switch to Java, Ruby, C#, or whatever since Python
was inexplicably "too slow" for "real work".  This being open source, I'm
certainly willing to help in the effort to do so, but not if potential
solutions will be ruled out as being unnecessary.

-Kevin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Antoine Pitrou
Le samedi 21 juin 2008 à 10:33 +0200, "Martin v. Löwis" a écrit :
> > I don't think expecting people to tweak gc parameters when they witness
> > performance problems is reasonable.
> 
> What follows from that? To me, the natural conclusion is "people who
> witness performance problems just need to despair, or accept them, as
> they can't do anything about it",

To me, Amaury's answer implies that most people can't do anything about
it, indeed.

Well, they could hang themselves or switch to another language (which
some people might view as equivalent :-)), but perhaps optimistically
the various propositions that were sketched out in this thread (by Adam
Olsen and Greg Ewing) could bring an improvement. I don't know how
realistic they are, perhaps an expert would have an answer.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I don't think expecting people to tweak gc parameters when they witness
> performance problems is reasonable.

What follows from that? To me, the natural conclusion is "people who
witness performance problems just need to despair, or accept them, as
they can't do anything about it", however, I don't think this is the
conclusion that you had in mind.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Antoine Pitrou
Le vendredi 20 juin 2008 à 17:44 +0200, Amaury Forgeot d'Arc a écrit :
> In short: the gc is tuned for typical usage. If your usage of python
> is specific,
> use gc.set_threshold and increase its values.

It's fine for people "in the know" who take the time to test their code
using various gc parameters. But most people don't (I know I never did,
and until recently I didn't even know the gc could have such a
disastrous effect on performance).

I don't think expecting people to tweak gc parameters when they witness
performance problems is reasonable.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Adam Olsen
On Fri, Jun 20, 2008 at 9:44 AM, Amaury Forgeot d'Arc
<[EMAIL PROTECTED]> wrote:
> 2008/6/20 Kevin Jacobs <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>:
>> On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Kevin Jacobs  bioinformed.com>  gmail.com>
>>> writes:
>>> >
>>> > +1 on a C API for enabling and disabling GC.  I have several instances
>>> > where
>>> I create a large number of objects non-cyclic objects where I see huge GC
>>> overhead (30+ seconds with gc enabled, 0.15 seconds when disabled).
>>>
>>> Could you try to post a stripped-down, self-contained example of such
>>> behaviour?
>>
>> $ python -m timeit 'zip(*[range(100)]*5)'
>> 10 loops, best of 3: 496 msec per loop
>>
>> $ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)'
>> 10 loops, best of 3: 2.93 sec per loop
>
> I remember that a similar issue was discussed some months ago:
> http://bugs.python.org/issue2607
>
> In short: the gc is tuned for typical usage. If your usage of python
> is specific,
> use gc.set_threshold and increase its values.

For very large bursts of allocation, tuning is no different from
disabling it outright, and disabling is simpler/more reliable.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Amaury Forgeot d'Arc
2008/6/20 Kevin Jacobs <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>:
> On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]>
> wrote:
>>
>> Kevin Jacobs  bioinformed.com>  gmail.com>
>> writes:
>> >
>> > +1 on a C API for enabling and disabling GC.  I have several instances
>> > where
>> I create a large number of objects non-cyclic objects where I see huge GC
>> overhead (30+ seconds with gc enabled, 0.15 seconds when disabled).
>>
>> Could you try to post a stripped-down, self-contained example of such
>> behaviour?
>
> $ python -m timeit 'zip(*[range(100)]*5)'
> 10 loops, best of 3: 496 msec per loop
>
> $ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)'
> 10 loops, best of 3: 2.93 sec per loop

I remember that a similar issue was discussed some months ago:
http://bugs.python.org/issue2607

In short: the gc is tuned for typical usage. If your usage of python
is specific,
use gc.set_threshold and increase its values.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Kevin Jacobs <[EMAIL PROTECTED]>
On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]>
wrote:

>
> Kevin Jacobs  bioinformed.com>  gmail.com>
> writes:
> >
> > +1 on a C API for enabling and disabling GC.  I have several instances
> where
> I create a large number of objects non-cyclic objects where I see huge GC
> overhead (30+ seconds with gc enabled, 0.15 seconds when disabled).
>
> Could you try to post a stripped-down, self-contained example of such
> behaviour?



$ python -m timeit 'zip(*[range(100)]*5)'
10 loops, best of 3: 496 msec per loop

$ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)'
10 loops, best of 3: 2.93 sec per loop

Note that timeit cheats and disables GC by default.

Attached is a less stripped down script to demonstrate the super-linear
behavior for somewhat naively coded transpose operators.  The output is
listed below:

FUNCTIONROWS COLUMNS GC ENABLEDGC DISABLED
--  ---  --    
transpose_comp5   00.0.
transpose_comp5  250.55350.3537
transpose_comp5  501.43590.6868
transpose_comp5  752.71481.0760
transpose_comp5 1003.80701.3936
transpose_comp5 1255.51841.7617
transpose_comp5 1507.88282.1308
transpose_comp5 1759.32792.5364
transpose_comp5 200   11.82482.7399
transpose_comp5 225   14.74363.1585
transpose_comp5 250   18.44523.5818
transpose_comp5 275   21.48563.8988
transpose_comp5 300   24.41104.3148
transpose_zip 5   00.0.
transpose_zip 5  250.25370.0658
transpose_zip 5  500.83800.1324
transpose_zip 5  751.75070.1989
transpose_zip 5 1002.61690.2648
transpose_zip 5 1254.07600.3317
transpose_zip 5 1505.88520.4145
transpose_zip 5 1757.39250.5161
transpose_zip 5 200   10.07550.6708
transpose_zip 5 225   14.26980.7760
transpose_zip 5 250   16.72910.9022
transpose_zip 5 275   20.38331.0179
transpose_zip 5 300   24.55151.0971

Hope this helps,
-Kevin
import gc
import time

def transpose_comp(rows):
  return [ [ row[i] for row in rows ] for i in xrange(len(rows[0])) ]

def transpose_zip(rows):
  return zip(*rows)

def bench(func, rows, cols):
  gc.enable()
  gc.collect()

  data = [ range(cols) ]*rows

  t0 = time.time()

  func(data)

  t1 = time.time()
  gc.disable()

  func(data)

  t2 = time.time()
  gc.enable()

  return t1-t0,t2-t1

print 'FUNCTIONROWS COLUMNS GC ENABLEDGC DISABLED '
print '--  ---  --    '

rows = 5
for func in [transpose_comp,transpose_zip]:
  for cols in range(0,301,25):
enabled,disabled = bench(func, rows, cols)
print '%-15s  %10d  %10d  %12.4f  %12.4f' % \
   (func.func_name,rows,cols,enabled,disabled)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Antoine Pitrou

Hi,

Kevin Jacobs  bioinformed.com>  gmail.com> writes:
> 
> +1 on a C API for enabling and disabling GC.  I have several instances where 
I create a large number of objects non-cyclic objects where I see huge GC 
overhead (30+ seconds with gc enabled, 0.15 seconds when disabled).

Could you try to post a stripped-down, self-contained example of such behaviour?

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-20 Thread Kevin Jacobs <[EMAIL PROTECTED]>
+1 on a C API for enabling and disabling GC.

I have several instances where I create a large number of objects non-cyclic
objects where I see huge GC overhead (30+ seconds with gc enabled, 0.15
seconds when disabled).

+1000 to fixing the garbage collector to be smart enough to self-regulate
itself better.

In the mean time, I use the following context manager to deal with the
hotspots as I find them:

class gcdisabled(object):
  '''
  Conext manager to temporarily disable Python's cyclic garbage collector.
  The primary use is to avoid thrashing while allocating large numbers of
  non-cyclic objects due to an overly aggressive garbage collector behavior.

  Will disable GC if it is enabled upon entry and renable upon exit:

  >>> gc.isenabled()
  True
  >>> with gcdisabled():
  ...   print gc.isenabled()
  False
  >>> print gc.isenabled()
  True

  Will not reenable if GC was disabled upon entry:

  >>> gc.disable()
  >>> gc.isenabled()
  False
  >>> with gcdisabled():
  ...   gc.isenabled()
  False
  >>> gc.isenabled()
  False
  '''
  def __init__(self):
self.isenabled = gc.isenabled()

  def __enter__(self):
gc.disable()

  def __exit__(self, type, value, traceback):
if self.isenabled:
  gc.enable()
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-19 Thread Greg Ewing

Alexandre Vassalotti wrote:


Do you have any idea how this behavior could be fixed? I am not a GC
expert, but I could try to fix this.


Perhaps after making a GC pass you could look at the
number of objects reclaimed during that pass, and if
it's less than some fraction of the objects in existence,
increase the threshold for performing GC by some
factor.

You would also want to do the opposite, so that a
GC pass which reclaims a large proportion of objects
would reduce the threshold back down again.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-19 Thread Adam Olsen
On Thu, Jun 19, 2008 at 3:23 PM, Alexandre Vassalotti
<[EMAIL PROTECTED]> wrote:
> On Sun, Jun 1, 2008 at 12:28 AM, Adam Olsen <[EMAIL PROTECTED]> wrote:
>> On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti
>> <[EMAIL PROTECTED]> wrote:
>>> Would anyone mind if I did add a public C API for gc.disable() and
>>> gc.enable()? I would like to use it as an optimization for the pickle
>>> module (I found out that I get a good 2x speedup just by disabling the
>>> GC while loading large pickles). Of course, I could simply import the
>>> gc module and call the functions there, but that seems overkill to me.
>>> I included the patch below for review.
>>
>> I'd rather see it fixed.  It behaves quadratically if you load enough
>> to trigger full collection a few times.
>>
>
> Do you have any idea how this behavior could be fixed? I am not a GC
> expert, but I could try to fix this.

Not sure.  For long-running programs we actually want a quadratic
cost, but spread out over a long period of time.  It's the bursts of
allocation that shouldn't be quadratic.

So maybe balance it by the total number of allocations that have
happened.  If the total number of allocations is less than 10 times
the allocations that weren't promptly deleted, assume it's a burst.
If it's more than 10 times, assume it's over a long period.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-19 Thread Alexandre Vassalotti
On Sun, Jun 1, 2008 at 12:28 AM, Adam Olsen <[EMAIL PROTECTED]> wrote:
> On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti
> <[EMAIL PROTECTED]> wrote:
>> Would anyone mind if I did add a public C API for gc.disable() and
>> gc.enable()? I would like to use it as an optimization for the pickle
>> module (I found out that I get a good 2x speedup just by disabling the
>> GC while loading large pickles). Of course, I could simply import the
>> gc module and call the functions there, but that seems overkill to me.
>> I included the patch below for review.
>
> I'd rather see it fixed.  It behaves quadratically if you load enough
> to trigger full collection a few times.
>

Do you have any idea how this behavior could be fixed? I am not a GC
expert, but I could try to fix this.

-- Alexandre
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-05-31 Thread Adam Olsen
On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti
<[EMAIL PROTECTED]> wrote:
> Would anyone mind if I did add a public C API for gc.disable() and
> gc.enable()? I would like to use it as an optimization for the pickle
> module (I found out that I get a good 2x speedup just by disabling the
> GC while loading large pickles). Of course, I could simply import the
> gc module and call the functions there, but that seems overkill to me.
> I included the patch below for review.

I'd rather see it fixed.  It behaves quadratically if you load enough
to trigger full collection a few times.


-- 
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] C API for gc.enable() and gc.disable()

2008-05-31 Thread Alexandre Vassalotti
Would anyone mind if I did add a public C API for gc.disable() and
gc.enable()? I would like to use it as an optimization for the pickle
module (I found out that I get a good 2x speedup just by disabling the
GC while loading large pickles). Of course, I could simply import the
gc module and call the functions there, but that seems overkill to me.
I included the patch below for review.

-- Alexandre



Index: Include/objimpl.h
===
--- Include/objimpl.h   (revision 63766)
+++ Include/objimpl.h   (working copy)
@@ -221,8 +221,10 @@
  * ==
  */

-/* C equivalent of gc.collect(). */
+/* C equivalent of gc.collect(), gc.enable() and gc.disable(). */
 PyAPI_FUNC(Py_ssize_t) PyGC_Collect(void);
+PyAPI_FUNC(void) PyGC_Enable(void);
+PyAPI_FUNC(void) PyGC_Disable(void);

 /* Test if a type has a GC head */
 #define PyType_IS_GC(t) PyType_HasFeature((t), Py_TPFLAGS_HAVE_GC)
Index: Modules/gcmodule.c
===
--- Modules/gcmodule.c  (revision 63766)
+++ Modules/gcmodule.c  (working copy)
@@ -1252,6 +1252,18 @@
return n;
 }

+void
+PyGC_Disable(void)
+{
+enabled = 0;
+}
+
+void
+PyGC_Enable(void)
+{
+enabled = 1;
+}
+
 /* for debugging */
 void
 _PyGC_Dump(PyGC_Head *g)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com