Re: [Python-Dev] C API for gc.enable() and gc.disable()
Hello, Andrey Zhmoginov wrote: > I don't know if the following question is relevant, but it seems that many > people here are familiar with Python cyclic garbage collector. > I see Python [v2.5.2 (r252:60911, Jul 31 2008, 17:28:52)] crashing with > Segment fault when I extend Python with a very simple module. This behavior > is observed when I create a thousand of lists (it does not crash with > 10-100) in the module with the garbage collector turned on. When I turn it > off - everything is perfect. I suspect that it is my module, but if it is a > Python bug ( _GC_Malloc? memory leak somewhere?), it may be worth reporting. > > The gdb "where" reply is the following: > > #0 0x080d8de9 in PyErr_Occurred () > #1 0x080f508f in _PyObject_GC_Malloc () > #2 0x080f5155 in _PyObject_GC_New () > #3 0x08079c98 in PyList_New () > #4 0xb7f53519 in draw_d_simple () from ./rt/rt_core.so > #5 0xb7cf7833 in ffi_call_SYSV () from > /usr/lib/python2.5/lib-dynload/_ctypes.so > #6 0xb7cf766a in ffi_call () from > /usr/lib/python2.5/lib-dynload/_ctypes.so > #7 0xb7cf2534 in _CallProc () from > /usr/lib/python2.5/lib-dynload/_ctypes.so > #8 0xb7cec02a in ?? () from /usr/lib/python2.5/lib-dynload/_ctypes.so > #9 0x0805cb97 in PyObject_Call () > #10 0x080c7aa7 in PyEval_EvalFrameEx () > #11 0x080c96e5 in PyEval_EvalFrameEx () > #12 0x080cb1f7 in PyEval_EvalCodeEx () > #13 0x080cb347 in PyEval_EvalCode () > #14 0x080ea818 in PyRun_FileExFlags () > #15 0x080eaab9 in PyRun_SimpleFileExFlags () > #16 0x08059335 in Py_Main () > #17 0x080587f2 in main () > > The crashing python code is: > > from ctypes import * > import gc > core = CDLL( "./tst.so" ) > core.a.argtypes = [] > core.a.restype = py_object > #gc.disable() > core.a() > #gc.enable() > > And tst.cpp is: > > #include > extern "C" { PyObject *a(); } > PyObject *a() > { > int n = 1000; > PyObject *item; > for ( int i = 0; i < n; i++ ) item = PyList_New( 0 ); // Crashes > here (somewhere in between). > return item; > } > > tst.cpp is compiled with: > > g++ -shared -Wl,-soname,tst.tmp.so -o tst.so tst.o > g++ -I /usr/include/python2.5 -Wall -fPIC -o tst.o -c tst.cpp This has nothing to do with the garbage collection, but with the GIL (the famous Global Interpreter Lock http://docs.python.org/api/threads.html ) When ctypes calls a CDLL function, it releases the GIL (to let eventual other threads run) Your example crashes because it calls python API functions when the GIL is not held, and is so invalid. There are two solutions: - re-acquire the GIL in your C functions with PyGILState_Ensure() & co - use ctypes.PyDLL( "./tst.so" ), which does not release the GIL. Hope this helps, -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Would anyone mind if I did add a public C API for gc.disable() and gc.enable()? I would like to use it as an optimization for the pickle module (I found out that I get a good 2x speedup just by disabling the GC while loading large pickles). Of course, I could simply import the gc module and call the functions there, but that seems overkill to me. I included the patch below for review. I don't know if the following question is relevant, but it seems that many people here are familiar with Python cyclic garbage collector. I see Python [v2.5.2 (r252:60911, Jul 31 2008, 17:28:52)] crashing with Segment fault when I extend Python with a very simple module. This behavior is observed when I create a thousand of lists (it does not crash with 10-100) in the module with the garbage collector turned on. When I turn it off - everything is perfect. I suspect that it is my module, but if it is a Python bug ( _GC_Malloc? memory leak somewhere?), it may be worth reporting. The gdb "where" reply is the following: #0 0x080d8de9 in PyErr_Occurred () #1 0x080f508f in _PyObject_GC_Malloc () #2 0x080f5155 in _PyObject_GC_New () #3 0x08079c98 in PyList_New () #4 0xb7f53519 in draw_d_simple () from ./rt/rt_core.so #5 0xb7cf7833 in ffi_call_SYSV () from /usr/lib/python2.5/lib-dynload/_ctypes.so #6 0xb7cf766a in ffi_call () from /usr/lib/python2.5/lib-dynload/_ctypes.so #7 0xb7cf2534 in _CallProc () from /usr/lib/python2.5/lib-dynload/_ctypes.so #8 0xb7cec02a in ?? () from /usr/lib/python2.5/lib-dynload/_ctypes.so #9 0x0805cb97 in PyObject_Call () #10 0x080c7aa7 in PyEval_EvalFrameEx () #11 0x080c96e5 in PyEval_EvalFrameEx () #12 0x080cb1f7 in PyEval_EvalCodeEx () #13 0x080cb347 in PyEval_EvalCode () #14 0x080ea818 in PyRun_FileExFlags () #15 0x080eaab9 in PyRun_SimpleFileExFlags () #16 0x08059335 in Py_Main () #17 0x080587f2 in main () The crashing python code is: from ctypes import * import gc core = CDLL( "./tst.so" ) core.a.argtypes = [] core.a.restype = py_object #gc.disable() core.a() #gc.enable() And tst.cpp is: #include extern "C" { PyObject *a(); } PyObject *a() { int n = 1000; PyObject *item; for ( int i = 0; i < n; i++ ) item = PyList_New( 0 ); // Crashes here (somewhere in between). return item; } tst.cpp is compiled with: g++ -shared -Wl,-soname,tst.tmp.so -o tst.so tst.o g++ -I /usr/include/python2.5 -Wall -fPIC -o tst.o -c tst.cpp Thanks for your help! - Andrew. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Jeff Hall wrote: I mistakenly thought that was because they were assumed to be small. It sounds like they're ignored because they're automatically collected and so they SHOULD be ignored for object garbage collection. Strings aren't tracked by the cyclic garbage collector because they don't contain object references and therefore can't form part of a cycle. However, unless I'm mistaken, allocations and deallocations of them are still counted for the purpose of determining when to perform a cyclic GC pass. So if you allocate lots of strings and they aren't getting deallocated, a cyclic GC pass will eventually occur, in case the strings are being referenced from a cycle that can be cleaned up. I don't know whether/how re uses string objects internally while it's matching, so I can't say what its garbage collection characteristics might be when matching against a huge string. The behaviour you observed might have been due to the nature of the re being matched -- some res can have quadratic or exponential behaviour all by themselves. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Thu, Jun 26, 2008 at 12:01 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: >> I would it be possible, if not a good idea, to only track object >> deallocations as the GC traversal trigger? As far as I know, dangling >> cyclic references cannot be formed when allocating objects. > > Not sure what you mean by that. > > x = [] > x.append(x) > del x > > creates a cycle with no deallocation occurring. > Oh... never mind then. -- Alexandre ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> I would it be possible, if not a good idea, to only track object > deallocations as the GC traversal trigger? As far as I know, dangling > cyclic references cannot be formed when allocating objects. Not sure what you mean by that. x = [] x.append(x) del x creates a cycle with no deallocation occurring. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Wed, Jun 25, 2008 at 4:55 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > I think exactly the other way 'round. The timing of thing should not > matter at all, only the exact sequence of allocations and deallocations. I would it be possible, if not a good idea, to only track object deallocations as the GC traversal trigger? As far as I know, dangling cyclic references cannot be formed when allocating objects. So, this could potentially mitigate the quadratic behavior during allocation bursts. -- Alexandre ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> I took the statement, "Current GC only takes into account container > objects, which, most significantly, ignores string objects (of which > most applications create plenty)" to mean that strings were ignored for > deciding when to do garbage collection. I mistakenly thought that was > because they were assumed to be small. It sounds like they're ignored > because they're automatically collected and so they SHOULD be ignored > for object garbage collection. More precisely, a string object can never participate in a cycle (it can be referenced from a cycle, but not be *in* the cycle, as it has no references to other objects). GC in Python is only about container objects (which potentially can be cyclic); non-container objects are released when the refcount says they are no longer referenced. Whether or not allocation of definitely-non-cyclic objects should still trigger cyclic GC (in the hope that some objects hang on a garbage cycle) is a question that is open to debate; I'd prefer an analysis of existing applications before making decisions. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Wed, Jun 25, 2008 at 4:55 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > It seems to me that the root problem is allocation spikes of legitimate, > > useful data. Perhaps then we need some sort of "test" to determine if > > those are legitimate. Perhaps checking every nth (with n decreasing as > > allocation bytes increases) object allocated during a "spike" could be > > useful. Then delay garbage collection until x consecutive objects are > > found to be garbage? > > > > It seems like we should be attacking the root cause rather than finding > > some convoluted math that attempts to work for all scenarios. > > I think exactly the other way 'round. The timing of thing should not > matter at all, only the exact sequence of allocations and deallocations. > > I trust provable maths much more than I trust ad-hoc heuristics, even > if you think the math is convoluted. > I probably chose my wording poorly (particularly for a newcomer/outsider). What I meant was that the numbers used in GC currently appear arbitrary. The idea of three "groups" (youngest, oldest and middle) is also arbitrary. Would it not be better to tear that system apart and create a "sliding" scale. If the timing method is undesirable then make it slide based on the allocation/deallocation difference. In this way, the current breakpoints and number of groups (all of which are arbitrary and fixed) could be replaced by one coefficient (and yes, I recognize that it would also be arbitrary but it would be one, tweakable number rather than several). My gut tells me that your current fix is going to work just fine for now but we're going to end up tweaking it (or at least discussing tweaking it) every 6-12 months. > > On a side note, the information about not GCing on string objects is > > interesting? Is there a way to override this behavior? > > I think you misunderstand. Python releases unused string objects just > fine, and automatically. It doesn't even need GC for that. > I took the statement, "Current GC only takes into account container objects, which, most significantly, ignores string objects (of which most applications create plenty)" to mean that strings were ignored for deciding when to do garbage collection. I mistakenly thought that was because they were assumed to be small. It sounds like they're ignored because they're automatically collected and so they SHOULD be ignored for object garbage collection. Thanks for the clarification... Back to the drawing board on my other problem ;) -- Haikus are easy Most make very little sense Refrigerator ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> It seems to me that the root problem is allocation spikes of legitimate, > useful data. Perhaps then we need some sort of "test" to determine if > those are legitimate. Perhaps checking every nth (with n decreasing as > allocation bytes increases) object allocated during a "spike" could be > useful. Then delay garbage collection until x consecutive objects are > found to be garbage? > > It seems like we should be attacking the root cause rather than finding > some convoluted math that attempts to work for all scenarios. I think exactly the other way 'round. The timing of thing should not matter at all, only the exact sequence of allocations and deallocations. I trust provable maths much more than I trust ad-hoc heuristics, even if you think the math is convoluted. > On a side note, the information about not GCing on string objects is > interesting? Is there a way to override this behavior? I think you misunderstand. Python releases unused string objects just fine, and automatically. It doesn't even need GC for that. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
It seems to me that the root problem is allocation spikes of legitimate, useful data. Perhaps then we need some sort of "test" to determine if those are legitimate. Perhaps checking every nth (with n decreasing as allocation bytes increases) object allocated during a "spike" could be useful. Then delay garbage collection until x consecutive objects are found to be garbage? It seems like we should be attacking the root cause rather than finding some convoluted math that attempts to work for all scenarios. On a side note, the information about not GCing on string objects is interesting? Is there a way to override this behavior? I've found that re.py chokes on large text files (4MB plus) without line ends (don't ask, they're not our files but we have to parse them). I wonder if this isn't the reason... If the answer to that is, "no, strings are always ignored" I'd recommend rethinking this (or providing an option to override somehow. -- Haikus are easy Most make very little sense Refrigerator ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Martin v. Löwis v.loewis.de> writes: > > I'd like to see in an experiment whether this is really true. Right, all those ideas should be implemented and tried out. I don't really have time to spend on it right now. Also, what's missing is a suite of performance/efficiency tests for the garbage collector. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> It would not help the quadratic behaviour - and is orthogonal to your > proposal - > , but at least avoid calling the GC too often when lots of small objects are > allocated (as opposed to lots of large objects). I'd like to see in an experiment whether this is really true. Current GC only takes into account container objects, which, most significantly, ignores string objects (of which most applications create plenty). If you make actual memory consumption a trigger, the string objects would count towards triggering GC. That may or may not be the better approach, anyway - but it's not certain (for me) that such a scheme would cause GC to be triggered less often, in a typical application. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Martin v. Löwis v.loewis.de> writes: > Currently, only youngest collections are triggered by allocation > rate; middle and old are triggered by frequency of youngest collection. > So would you now specify that the youngest collection should occur > if-and-only-if a new arena is allocated? Or discount arenas returned > from arenas allocated? The latter sounds reasonable. IIRC an arena is 256KB, which is less than an entry level L2 cache. Therefore waiting for an arena to be filled shouldn't deteriorate cache locality a lot. To avoid situations where the GC is never called we could combine that with an allocation counter, but with a much higher threshold than currently. > Or apply this to triggering other generation > collections but youngest? How would that help the quadratic behavior > (which really needs to apply a factor somewhere)? It would not help the quadratic behaviour - and is orthogonal to your proposal - , but at least avoid calling the GC too often when lots of small objects are allocated (as opposed to lots of large objects). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> pymalloc needing to allocate a new arena would be a different way to > track an excess of allocations over deallocations, and in some ways > more sensible (since it would reflect an excess of /bytes/ allocated > over bytes freed, rather than an excess in the counts of objects > allocated-over-freed regardless of their sizes -- an implication is, > e.g., that cyclic gc would be triggered much less frequently by mass > creation of small tuples than of small dicts, since a small tuple > consumes much less memory than a small dict). > > Etc. ;-) :-) So my question still is: how exactly? Currently, only youngest collections are triggered by allocation rate; middle and old are triggered by frequency of youngest collection. So would you now specify that the youngest collection should occur if-and-only-if a new arena is allocated? Or discount arenas returned from arenas allocated? Or apply this to triggering other generation collections but youngest? How would that help the quadratic behavior (which really needs to apply a factor somewhere)? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
[Antoine Pitrou] >> Would it be helpful if the GC was informed of memory growth by the >> Python memory allocator (that is, each time it either asks or gives back >> a block of memory to the system allocator) ? [Martin v. Löwis] > I don't see how. The garbage collector is already informed about memory > growth; it learns exactly when a container object is allocated or > deallocated. That the allocator then requests memory from the system > only confirms what the garbage collector already knew: that there are > lots of allocated objects. From that, one could infer that it might > be time to perform garbage collection - or one could infer that all > the objects are really useful, and no garbage can be collected. Really the same conundrum we currently face: cyclic gc is currently triggered by reaching a certain /excess/ of allocations over deallocations. From that we /do/ infer it's time to perform garbage collection -- but, as some examples here showed, it's sometimes really the case that the true meaning of the excess is that "all the objects are really useful, and no garbage can be collected -- and I'm creating a lot of them". pymalloc needing to allocate a new arena would be a different way to track an excess of allocations over deallocations, and in some ways more sensible (since it would reflect an excess of /bytes/ allocated over bytes freed, rather than an excess in the counts of objects allocated-over-freed regardless of their sizes -- an implication is, e.g., that cyclic gc would be triggered much less frequently by mass creation of small tuples than of small dicts, since a small tuple consumes much less memory than a small dict). Etc. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
"Martin v. Löwis" writes: > > XEmacs implements this strategy in a way which is claimed to give > > constant amortized time (ie, averaged over memory allocated). > > See my recent proposal. I did, crossed in the mail. To the extent that I understand both systems, your proposal looks like an improvement over what we've got. > > However, isn't the real question whether there is memory pressure or > > not? If you've got an unloaded machine with 2GB of memory, even a 1GB > > spike might have no observable consequences. How about a policy of > > GC-ing with decreasing period ("time" measured by bytes allocated or > > number of allocations) as the fraction of memory used increases, > > starting from a pretty large fraction (say 50% by default)? > > The problem with such an approach is that it is very difficult to > measure. On some systems it should be possible to get information about how much paging is taking place, which would be an indicator of pressure. > What to do about virtual memory? If you're in virtual memory, you're already in trouble. Once you're paging, you need to increase time between GCs, so the rule would not be monotonic. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Martin v. Löwis wrote: Antoine Pitrou wrote: Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : I don't think any strategies based on timing will be successful. Instead, one should count and analyze objects (although I'm unsure how exactly that could work). Would it be helpful if the GC was informed of memory growth by the Python memory allocator (that is, each time it either asks or gives back a block of memory to the system allocator) ? I don't see how. The garbage collector is already informed about memory growth; it learns exactly when a container object is allocated or deallocated. That the allocator then requests memory from the system only confirms what the garbage collector already knew: that there are lots of allocated objects. From that, one could infer that it might be time to perform garbage collection - or one could infer that all the objects are really useful, and no garbage can be collected. I was wondering whether it might be useful to detect the end of an allocation spike: if PyMalloc incremented an internal counter each time it requested more memory from the system, then the GC could check whether that number had changed on each collection cycle. If the GC goes a certain number of collection cycles without more memory being requested from the system, then it can assume the allocation spike is over and tighten up the default tuning again. Such a pymalloc counter would provide a slightly more holistic view of overall memory usage changes than the container-focused view of the cyclic GC. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> XEmacs implements this strategy in a way which is claimed to give > constant amortized time (ie, averaged over memory allocated). See my recent proposal. The old trick is to do reorganizations in a fixed fraction of the total size, resulting in a per-increase amortized-constant overhead (assuming each reorganization takes time linear with total size). > However, isn't the real question whether there is memory pressure or > not? If you've got an unloaded machine with 2GB of memory, even a 1GB > spike might have no observable consequences. How about a policy of > GC-ing with decreasing period ("time" measured by bytes allocated or > number of allocations) as the fraction of memory used increases, > starting from a pretty large fraction (say 50% by default)? The problem with such an approach is that it is very difficult to measure. What to do about virtual memory? What to do about other applications that also consume memory? On some systems (Windows in particular), the operating system indicates memory pressure through some IPC mechanism; on such systems, it might be reasonable to perform garbage collection (only?) when the system asks for it. However, the system might not ask for GC while the swap space is still not exhausted, meaning that the deferred GC would take a long time to complete (having to page in every object). > Nevertheless, I think the real solution has to be for Python > programmers to be aware that there is GC, and that they can tune it. I don't think there is a "real solution". I think programmers should abstain from complaining if they can do something about the problem in their own application (unless the complaint is formulated as a patch) - wait - I think programmers should abstain from complaining, period. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
"Martin v. Löwis" writes: > Given the choice of "run slower" and "run out of memory", Python should > always prefer the former. > > One approach could be to measure how successful a GC run was: if GC > finds that more-and-more objects get allocated and very few (or none) > are garbage, it might conclude that this is an allocation spike, and > back off. The tricky question is how to find out that the spike is > over. XEmacs implements this strategy in a way which is claimed to give constant amortized time (ie, averaged over memory allocated). I forget the exact parameters, but ISTR it's just period ("time" measured by bytes allocated) is proportional to currently allocated memory. Some people claim this is much more comfortable than the traditional "GC after N bytes are allocated" algorithm but I don't notice much difference. I don't know how well this intuition carries over to noninteractive applications. In XEmacs experimenting with such strategies is pretty easy, since the function that determines period is only a few lines long. I assume that would be true of Python, too. However, isn't the real question whether there is memory pressure or not? If you've got an unloaded machine with 2GB of memory, even a 1GB spike might have no observable consequences. How about a policy of GC-ing with decreasing period ("time" measured by bytes allocated or number of allocations) as the fraction of memory used increases, starting from a pretty large fraction (say 50% by default)? The total amount of memory could be a soft limit, defaulting to the amount of fast memory actually available. For interactive and maybe some batch applications, it might be appropriate to generate a runtime warning as memory use approches some limits, too. Nevertheless, I think the real solution has to be for Python programmers to be aware that there is GC, and that they can tune it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Antoine Pitrou wrote: > Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : >> I don't think any strategies based on timing will be successful. >> Instead, one should count and analyze objects (although I'm unsure >> how exactly that could work). > > Would it be helpful if the GC was informed of memory growth by the > Python memory allocator (that is, each time it either asks or gives back > a block of memory to the system allocator) ? I don't see how. The garbage collector is already informed about memory growth; it learns exactly when a container object is allocated or deallocated. That the allocator then requests memory from the system only confirms what the garbage collector already knew: that there are lots of allocated objects. From that, one could infer that it might be time to perform garbage collection - or one could infer that all the objects are really useful, and no garbage can be collected. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : > I don't think any strategies based on timing will be successful. > Instead, one should count and analyze objects (although I'm unsure > how exactly that could work). Would it be helpful if the GC was informed of memory growth by the Python memory allocator (that is, each time it either asks or gives back a block of memory to the system allocator) ? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Kevin Jacobs <[EMAIL PROTECTED]> wrote: I can say with complete certainty that of the 20+ programmers I've had working for me, many who have used Python for 3+ years, not a single one would think to question the garbage collector if they observed the kind of quadratic time complexity I've demonstrated. This is not because they are stupid, but because they have only a vague idea that Python even has a garbage collector, never mind that it could be behaving badly for such innocuous looking code. As I understand it, gc is needed now more that ever because new style classes make reference cycles more common. On the other hand, greatly increased RAM size (from some years ago) makes megaobject bursts possible. Such large bursts move the hidden quadratic do-nothing drag out of the relatively flat part of the curve (total time just double or triple what it should be) to where it can really bite. Leaving aside what you do for your local group, can we better warn Python programmers now, for the upcoming 2.5, 2.6, and 3.0 releases? Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says: "Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. (Implementation note: the current implementation uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage.)" I am not sure what to add here, (especially for those who do not read it;-). The Library Manual gc section says "Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles." Perhaps it should also say "You should disable when creating millions of objects without cycles". The installed documentation set (on Windows, at least) include some Python HOWTOs. If one were added on Space Management (implementations, problems, and solutions), would your developers read it? Maybe we should consider more carefully before declaring the status quo sufficient. Average developers do allocate millions of objects in bursts and super-linear time complexity for such operations is not acceptable. Thankfully I am around to help my programmers work around such issues or else they'd be pushing to switch to Java, Ruby, C#, or whatever since Python was inexplicably "too slow" for "real work". This being open source, I'm certainly willing to help in the effort to do so, but not if potential solutions will be ruled out as being unnecessary. To me, 'sufficient' (time-dependent) and 'necessary' are either too vague or too strict to being about what you want -- change. This is the third thread I have read (here + c.l.p) on default-mode gc problems (but all in the last couple of years or so). So, especially with the nice table someone posted recently, on time with and without gc, and considering that installed RAM continues to grow, I am persuaded that default behavior improvement that does not negatively impact the vast majority would be desirable. Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> > What follows from that? To me, the natural conclusion is "people who > > witness performance problems just need to despair, or accept them, as > > they can't do anything about it", however, I don't think this is the > > conclusion that you had in mind. > > > > I can say with complete certainty that of the 20+ programmers I've had > working for me, many who have used Python for 3+ years, not a single one > would think to question the garbage collector if they observed the kind of > quadratic time complexity I've demonstrated. This is not because they are > stupid, but because they have only a vague idea that Python even has a > garbage collector, never mind that it could be behaving badly for such > innocuous looking code. Perhaps this is something documentation could help. I'm thinking of a one-page checklist listing places they might look for performance problems, that your programmers could work through. Bill ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> I'm not sure I agree with this. GC IIRC was introduced primarily to > alleviate *long-term* memory starvation. I don't think that's historically the case. GC would not need to be generational if releasing short-lived objects shortly after they become garbage was irrelevant. Of course, it was always expected that much memory is released through mere reference counting, and that GC only kicks in "after some time". However "after some time" was changed from "after 5000 allocations" to "after 700 allocations" in r17274 | jhylton | 2000-09-05 17:44:50 +0200 (Di, 05 Sep 2000) | 2 lines Geänderte Pfade: M /python/trunk/Modules/gcmodule.c compromise value for threshold0: not too high, not too low Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Sat, Jun 21, 2008, "Martin v. L??wis" wrote: > > In general, any solution of the "do GC less often" needs to deal with > cases where lots of garbage gets produced in a short amount of time > (e.g. in a tight loop), and which run out of memory when GC is done less > often. > > Given the choice of "run slower" and "run out of memory", Python should > always prefer the former. I'm not sure I agree with this. GC IIRC was introduced primarily to alleviate *long-term* memory starvation. You are now IMO adding a new goal for GC that has not been previously articulated. I believe this requires consensus rather than a simple declaration of principle. Guido's opinion if he has one obviously overrules. ;-) Guido? -- Aahz ([EMAIL PROTECTED]) <*> http://www.pythoncraft.com/ "as long as we like the same operating system, things are cool." --piranha ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> Idea 1: Allow GC to run automatically no more often than n CPU seconds, > n being perhaps 5 or 10. I think it's very easy to exhaust the memory with such a policy, even though much memory would still be available. Worse, in a program producing a lot of garbage, performance will go significantly down as the python starts thrashing the swap space. > Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is > the time taken by the last GC round. How would that take the incremental GC into account? (i.e. what is "the time taken by the last GC round"?) Furthermore, the GC run time might well be under the resolution of the CPU seconds clock. > These limits could be reset or scaled by the GC collecting more than n% > of the generation 0 objects or maybe the number of PyMalloc arenas > increasing by a certain amount? I don't think any strategies based on timing will be successful. Instead, one should count and analyze objects (although I'm unsure how exactly that could work). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Sat, Jun 21, 2008 at 11:20 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > In general, any solution of the "do GC less often" needs to deal with > cases where lots of garbage gets produced in a short amount of time > (e.g. in a tight loop), and which run out of memory when GC is done less > often. > Idea 1: Allow GC to run automatically no more often than n CPU seconds, n being perhaps 5 or 10. Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is the time taken by the last GC round. These limits could be reset or scaled by the GC collecting more than n% of the generation 0 objects or maybe the number of PyMalloc arenas increasing by a certain amount? -Kevin > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> Well, they could hang themselves or switch to another language (which > some people might view as equivalent :-)), but perhaps optimistically > the various propositions that were sketched out in this thread (by Adam > Olsen and Greg Ewing) could bring an improvement. I don't know how > realistic they are, perhaps an expert would have an answer. In general, any solution of the "do GC less often" needs to deal with cases where lots of garbage gets produced in a short amount of time (e.g. in a tight loop), and which run out of memory when GC is done less often. Given the choice of "run slower" and "run out of memory", Python should always prefer the former. One approach could be to measure how successful a GC run was: if GC finds that more-and-more objects get allocated and very few (or none) are garbage, it might conclude that this is an allocation spike, and back off. The tricky question is how to find out that the spike is over. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> I can say with complete certainty that of the 20+ programmers I've had > working for me, many who have used Python for 3+ years, not a single one > would think to question the garbage collector if they observed the kind > of quadratic time complexity I've demonstrated. This is not because > they are stupid, but because they have only a vague idea that Python > even has a garbage collector, never mind that it could be behaving badly > for such innocuous looking code. > > Maybe we should consider more carefully before declaring the status quo > sufficient. This was precisely my question: What follows from the above observation? I personally didn't declare the status quo sufficient - I merely declared it as being the status quo. > Average developers do allocate millions of objects in > bursts and super-linear time complexity for such operations is not > acceptable. Thankfully I am around to help my programmers work around > such issues or else they'd be pushing to switch to Java, Ruby, C#, or > whatever since Python was inexplicably "too slow" for "real work". This > being open source, I'm certainly willing to help in the effort to do so, > but not if potential solutions will be ruled out as being unnecessary. I wouldn't rule out solutions as being unnecessary. I might rule out solutions that negatively impact existing software, for the sake of improving other existing software. Unfortunately, the only way to find out whether a solution will be ruled out is to propose it first. Only then you'll see what response you get. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Sat, Jun 21, 2008 at 4:33 AM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > I don't think expecting people to tweak gc parameters when they witness > > performance problems is reasonable. > > What follows from that? To me, the natural conclusion is "people who > witness performance problems just need to despair, or accept them, as > they can't do anything about it", however, I don't think this is the > conclusion that you had in mind. > I can say with complete certainty that of the 20+ programmers I've had working for me, many who have used Python for 3+ years, not a single one would think to question the garbage collector if they observed the kind of quadratic time complexity I've demonstrated. This is not because they are stupid, but because they have only a vague idea that Python even has a garbage collector, never mind that it could be behaving badly for such innocuous looking code. Maybe we should consider more carefully before declaring the status quo sufficient. Average developers do allocate millions of objects in bursts and super-linear time complexity for such operations is not acceptable. Thankfully I am around to help my programmers work around such issues or else they'd be pushing to switch to Java, Ruby, C#, or whatever since Python was inexplicably "too slow" for "real work". This being open source, I'm certainly willing to help in the effort to do so, but not if potential solutions will be ruled out as being unnecessary. -Kevin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Le samedi 21 juin 2008 à 10:33 +0200, "Martin v. Löwis" a écrit : > > I don't think expecting people to tweak gc parameters when they witness > > performance problems is reasonable. > > What follows from that? To me, the natural conclusion is "people who > witness performance problems just need to despair, or accept them, as > they can't do anything about it", To me, Amaury's answer implies that most people can't do anything about it, indeed. Well, they could hang themselves or switch to another language (which some people might view as equivalent :-)), but perhaps optimistically the various propositions that were sketched out in this thread (by Adam Olsen and Greg Ewing) could bring an improvement. I don't know how realistic they are, perhaps an expert would have an answer. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
> I don't think expecting people to tweak gc parameters when they witness > performance problems is reasonable. What follows from that? To me, the natural conclusion is "people who witness performance problems just need to despair, or accept them, as they can't do anything about it", however, I don't think this is the conclusion that you had in mind. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Le vendredi 20 juin 2008 à 17:44 +0200, Amaury Forgeot d'Arc a écrit : > In short: the gc is tuned for typical usage. If your usage of python > is specific, > use gc.set_threshold and increase its values. It's fine for people "in the know" who take the time to test their code using various gc parameters. But most people don't (I know I never did, and until recently I didn't even know the gc could have such a disastrous effect on performance). I don't think expecting people to tweak gc parameters when they witness performance problems is reasonable. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Fri, Jun 20, 2008 at 9:44 AM, Amaury Forgeot d'Arc <[EMAIL PROTECTED]> wrote: > 2008/6/20 Kevin Jacobs <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>: >> On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]> >> wrote: >>> >>> Kevin Jacobs bioinformed.com> gmail.com> >>> writes: >>> > >>> > +1 on a C API for enabling and disabling GC. I have several instances >>> > where >>> I create a large number of objects non-cyclic objects where I see huge GC >>> overhead (30+ seconds with gc enabled, 0.15 seconds when disabled). >>> >>> Could you try to post a stripped-down, self-contained example of such >>> behaviour? >> >> $ python -m timeit 'zip(*[range(100)]*5)' >> 10 loops, best of 3: 496 msec per loop >> >> $ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)' >> 10 loops, best of 3: 2.93 sec per loop > > I remember that a similar issue was discussed some months ago: > http://bugs.python.org/issue2607 > > In short: the gc is tuned for typical usage. If your usage of python > is specific, > use gc.set_threshold and increase its values. For very large bursts of allocation, tuning is no different from disabling it outright, and disabling is simpler/more reliable. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
2008/6/20 Kevin Jacobs <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>: > On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]> > wrote: >> >> Kevin Jacobs bioinformed.com> gmail.com> >> writes: >> > >> > +1 on a C API for enabling and disabling GC. I have several instances >> > where >> I create a large number of objects non-cyclic objects where I see huge GC >> overhead (30+ seconds with gc enabled, 0.15 seconds when disabled). >> >> Could you try to post a stripped-down, self-contained example of such >> behaviour? > > $ python -m timeit 'zip(*[range(100)]*5)' > 10 loops, best of 3: 496 msec per loop > > $ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)' > 10 loops, best of 3: 2.93 sec per loop I remember that a similar issue was discussed some months ago: http://bugs.python.org/issue2607 In short: the gc is tuned for typical usage. If your usage of python is specific, use gc.set_threshold and increase its values. -- Amaury Forgeot d'Arc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Fri, Jun 20, 2008 at 10:25 AM, Antoine Pitrou <[EMAIL PROTECTED]> wrote: > > Kevin Jacobs bioinformed.com> gmail.com> > writes: > > > > +1 on a C API for enabling and disabling GC. I have several instances > where > I create a large number of objects non-cyclic objects where I see huge GC > overhead (30+ seconds with gc enabled, 0.15 seconds when disabled). > > Could you try to post a stripped-down, self-contained example of such > behaviour? $ python -m timeit 'zip(*[range(100)]*5)' 10 loops, best of 3: 496 msec per loop $ python -m timeit -s 'import gc; gc.enable()' 'zip(*[range(100)]*5)' 10 loops, best of 3: 2.93 sec per loop Note that timeit cheats and disables GC by default. Attached is a less stripped down script to demonstrate the super-linear behavior for somewhat naively coded transpose operators. The output is listed below: FUNCTIONROWS COLUMNS GC ENABLEDGC DISABLED -- --- -- transpose_comp5 00.0. transpose_comp5 250.55350.3537 transpose_comp5 501.43590.6868 transpose_comp5 752.71481.0760 transpose_comp5 1003.80701.3936 transpose_comp5 1255.51841.7617 transpose_comp5 1507.88282.1308 transpose_comp5 1759.32792.5364 transpose_comp5 200 11.82482.7399 transpose_comp5 225 14.74363.1585 transpose_comp5 250 18.44523.5818 transpose_comp5 275 21.48563.8988 transpose_comp5 300 24.41104.3148 transpose_zip 5 00.0. transpose_zip 5 250.25370.0658 transpose_zip 5 500.83800.1324 transpose_zip 5 751.75070.1989 transpose_zip 5 1002.61690.2648 transpose_zip 5 1254.07600.3317 transpose_zip 5 1505.88520.4145 transpose_zip 5 1757.39250.5161 transpose_zip 5 200 10.07550.6708 transpose_zip 5 225 14.26980.7760 transpose_zip 5 250 16.72910.9022 transpose_zip 5 275 20.38331.0179 transpose_zip 5 300 24.55151.0971 Hope this helps, -Kevin import gc import time def transpose_comp(rows): return [ [ row[i] for row in rows ] for i in xrange(len(rows[0])) ] def transpose_zip(rows): return zip(*rows) def bench(func, rows, cols): gc.enable() gc.collect() data = [ range(cols) ]*rows t0 = time.time() func(data) t1 = time.time() gc.disable() func(data) t2 = time.time() gc.enable() return t1-t0,t2-t1 print 'FUNCTIONROWS COLUMNS GC ENABLEDGC DISABLED ' print '-- --- -- ' rows = 5 for func in [transpose_comp,transpose_zip]: for cols in range(0,301,25): enabled,disabled = bench(func, rows, cols) print '%-15s %10d %10d %12.4f %12.4f' % \ (func.func_name,rows,cols,enabled,disabled) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Hi, Kevin Jacobs bioinformed.com> gmail.com> writes: > > +1 on a C API for enabling and disabling GC. I have several instances where I create a large number of objects non-cyclic objects where I see huge GC overhead (30+ seconds with gc enabled, 0.15 seconds when disabled). Could you try to post a stripped-down, self-contained example of such behaviour? Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
+1 on a C API for enabling and disabling GC. I have several instances where I create a large number of objects non-cyclic objects where I see huge GC overhead (30+ seconds with gc enabled, 0.15 seconds when disabled). +1000 to fixing the garbage collector to be smart enough to self-regulate itself better. In the mean time, I use the following context manager to deal with the hotspots as I find them: class gcdisabled(object): ''' Conext manager to temporarily disable Python's cyclic garbage collector. The primary use is to avoid thrashing while allocating large numbers of non-cyclic objects due to an overly aggressive garbage collector behavior. Will disable GC if it is enabled upon entry and renable upon exit: >>> gc.isenabled() True >>> with gcdisabled(): ... print gc.isenabled() False >>> print gc.isenabled() True Will not reenable if GC was disabled upon entry: >>> gc.disable() >>> gc.isenabled() False >>> with gcdisabled(): ... gc.isenabled() False >>> gc.isenabled() False ''' def __init__(self): self.isenabled = gc.isenabled() def __enter__(self): gc.disable() def __exit__(self, type, value, traceback): if self.isenabled: gc.enable() ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
Alexandre Vassalotti wrote: Do you have any idea how this behavior could be fixed? I am not a GC expert, but I could try to fix this. Perhaps after making a GC pass you could look at the number of objects reclaimed during that pass, and if it's less than some fraction of the objects in existence, increase the threshold for performing GC by some factor. You would also want to do the opposite, so that a GC pass which reclaims a large proportion of objects would reduce the threshold back down again. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Thu, Jun 19, 2008 at 3:23 PM, Alexandre Vassalotti <[EMAIL PROTECTED]> wrote: > On Sun, Jun 1, 2008 at 12:28 AM, Adam Olsen <[EMAIL PROTECTED]> wrote: >> On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti >> <[EMAIL PROTECTED]> wrote: >>> Would anyone mind if I did add a public C API for gc.disable() and >>> gc.enable()? I would like to use it as an optimization for the pickle >>> module (I found out that I get a good 2x speedup just by disabling the >>> GC while loading large pickles). Of course, I could simply import the >>> gc module and call the functions there, but that seems overkill to me. >>> I included the patch below for review. >> >> I'd rather see it fixed. It behaves quadratically if you load enough >> to trigger full collection a few times. >> > > Do you have any idea how this behavior could be fixed? I am not a GC > expert, but I could try to fix this. Not sure. For long-running programs we actually want a quadratic cost, but spread out over a long period of time. It's the bursts of allocation that shouldn't be quadratic. So maybe balance it by the total number of allocations that have happened. If the total number of allocations is less than 10 times the allocations that weren't promptly deleted, assume it's a burst. If it's more than 10 times, assume it's over a long period. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Sun, Jun 1, 2008 at 12:28 AM, Adam Olsen <[EMAIL PROTECTED]> wrote: > On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti > <[EMAIL PROTECTED]> wrote: >> Would anyone mind if I did add a public C API for gc.disable() and >> gc.enable()? I would like to use it as an optimization for the pickle >> module (I found out that I get a good 2x speedup just by disabling the >> GC while loading large pickles). Of course, I could simply import the >> gc module and call the functions there, but that seems overkill to me. >> I included the patch below for review. > > I'd rather see it fixed. It behaves quadratically if you load enough > to trigger full collection a few times. > Do you have any idea how this behavior could be fixed? I am not a GC expert, but I could try to fix this. -- Alexandre ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for gc.enable() and gc.disable()
On Sat, May 31, 2008 at 10:11 PM, Alexandre Vassalotti <[EMAIL PROTECTED]> wrote: > Would anyone mind if I did add a public C API for gc.disable() and > gc.enable()? I would like to use it as an optimization for the pickle > module (I found out that I get a good 2x speedup just by disabling the > GC while loading large pickles). Of course, I could simply import the > gc module and call the functions there, but that seems overkill to me. > I included the patch below for review. I'd rather see it fixed. It behaves quadratically if you load enough to trigger full collection a few times. -- Adam Olsen, aka Rhamphoryncus ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] C API for gc.enable() and gc.disable()
Would anyone mind if I did add a public C API for gc.disable() and gc.enable()? I would like to use it as an optimization for the pickle module (I found out that I get a good 2x speedup just by disabling the GC while loading large pickles). Of course, I could simply import the gc module and call the functions there, but that seems overkill to me. I included the patch below for review. -- Alexandre Index: Include/objimpl.h === --- Include/objimpl.h (revision 63766) +++ Include/objimpl.h (working copy) @@ -221,8 +221,10 @@ * == */ -/* C equivalent of gc.collect(). */ +/* C equivalent of gc.collect(), gc.enable() and gc.disable(). */ PyAPI_FUNC(Py_ssize_t) PyGC_Collect(void); +PyAPI_FUNC(void) PyGC_Enable(void); +PyAPI_FUNC(void) PyGC_Disable(void); /* Test if a type has a GC head */ #define PyType_IS_GC(t) PyType_HasFeature((t), Py_TPFLAGS_HAVE_GC) Index: Modules/gcmodule.c === --- Modules/gcmodule.c (revision 63766) +++ Modules/gcmodule.c (working copy) @@ -1252,6 +1252,18 @@ return n; } +void +PyGC_Disable(void) +{ +enabled = 0; +} + +void +PyGC_Enable(void) +{ +enabled = 1; +} + /* for debugging */ void _PyGC_Dump(PyGC_Head *g) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com