On Jan 3, 2005, at 2:16 AM, Tim Peters wrote:

[Bob Ippolito]
...
Your expectation is not correct for Darwin's memory allocation scheme.
It seems that Darwin creates allocations of immutable size.  The only
way ANY part of an allocation will ever be used by ANYTHING else is if
free() is called with that allocation.

Ya, I understood that. My conclusion was that Darwin's realloc() implementation isn't production-quality. So it goes.

Whatever that means.

free() can be called either explicitly, or implicitly by calling realloc() with
a size larger than the size of the allocation. In that case, it will create a new
allocation of at least the requested size, copy the contents of the
original allocation into the new allocation (probably with
copy-on-write pages if it's large enough, so it might be cheap), and
free() the allocation.

Really? Another near-universal "quality of implementation" expectation is that a growing realloc() will strive to extend in-place. Like realloc(malloc(1000000), 1000001). For example, the theoretical guarantee that one-at-a-time list.append() has amortized linear time doesn't depend on that, but pragmatically it's greatly helped by a reasonable growing realloc() implementation.

I said that it created allocations of fixed size, not that it created allocations of exactly the size you asked it to. Yes, it will extend in-place for many cases, including the given.


In the case where realloc() specifies a size that is not greater than the
allocation's size, it will simply return the given allocation and cause no side-
effects whatsoever.


Was this a good decision? Probably not!

Sounds more like a bug (or two) to me than "a decision", but I don't know.

You said yourself that it is standards compliant ;) I have filed it as a bug, but it is probably unlikely to be backported to current versions of Mac OS X unless a case can be made that it is indeed a security flaw.


However, it is our (in the "I know you use Windows but I am not the only
one that uses Mac OS X sense) problem so long as Darwin is a supported
platform, because it is highly unlikely that Apple will backport any "fix" to
the allocator unless we can prove it has some security implications in
software shipped with their OS. ...

Is there any known case where Python performs poorly on this OS, for this reason, other than the "pass giant numbers to recv() and then shrink the string because we didn't get anywhere near that many bytes" case? Claiming rampant performance problems should require evidence too <wink>.

Known case? No. Do I want to search Python application-space to find one? No.


Presumably this can happen at other places (including third party
extensions), so a better place to do this might be _PyString_Resize().
list_resize() is another reasonable place to put this. I'm sure there
are other places that use realloc() too, and the majority of them do
this through obmalloc. So maybe instead of trying to track down all
the places where this can manifest, we should just "gunk up" Python and
patch PyObject_Realloc()?

There is no "choke point" for allocations in Python -- some places call the system realloc() directly. Maybe the latter matter on Darwin too, but maybe they don't. The scope of this hack spreads if they do. I have no idea how often realloc() is called directly by 3rd-party extension modules. It's called directly a lot in Zope's C code, but AFAICT only to grow vectors, never to shrink them.

In the case of Python, "some places" means "nowhere relevant". Four standard library extension modules relevant to the platform use realloc directly:


_sre
    Uses realloc only to grow buffers.
cPickle
    Uses realloc only to grow buffers.
cStringIO
    Uses realloc only to grow buffers.
regexpr:
    Uses realloc only to grow buffers.

If Zope doesn't use the allocator that Python gives it, then it can deal with its own problems. I would expect most extensions to use Python's allocator.

Since we are both pretty confident that other allocators aren't like Darwin,
this "gunk" can be #ifdef'ed to the __APPLE__ case.

#ifdef's are a last resort: they almost never go away, so they complicate the code forever after, and typically stick around for years even after the platform problems they intended to address have been fixed. For obvious reasons, they're also an endless source of platform-specific bugs.

They're also the only good way to deal with platform-specific inconsistencies. In this specific case, it's not even possible to determine if a particular allocator implementation is stupid or not without at least using a platform-allocator-specific function to query the size reserved by a given allocation.


Note that pymalloc already does a memcpy+free when in
PyObject_Realloc(p, n) p was obtained from the system malloc or
realloc but n is small enough to meet the "small object" threshold
(pymalloc "takes over" small blocks that result from a
PyObject_Realloc()).  That's a reasonable strategy *because* n is
always small in such cases.  If you're going to extend this strategy
to n of arbitrary size, then you may also create new performance
problems for some apps on Darwin (copying n bytes can get arbitrarily
expensive).

There's obviously a tradeoff between copying lots of bytes and having lots of memory go to waste. That should be taken into consideration when considering how many pages could be returned to the allocator. Note that we can ask the allocator how much memory an allocation has actually reserved (which is usually somewhat larger than the amount you asked it for) and how much memory an allocation will reserve for a given size. An allocation resize wouldn't even show up as smaller unless at least one page would be freed (for sufficiently large allocations anyway, the minimum granularity is 16 bytes because it guarantees that alignment). Obviously if you have a lot of pages anyway, one page isn't a big deal, so we would probably only resort to free()/memcpy() if some fair percentage of the total pages used by the allocation could be rescued.


If it does end up causing some real performance problems anyway, there's always deeper hacks like using vm_copy(), a Darwin specific function which will do copy-on-write instead (which only makes sense if the allocation is big enough for this to actually be a performance improvement).

...
I'm sure I'll find something, but what's important to me is that Python
works well on Mac OS X, so something should happen.

I agree the socket-abuse case should be fiddled, and for more reasons than just Darwin's realloc() quirks. I don't know that there are actual problems on Darwin broader than that case (and I'm not challenging you to contrive one, I'm asking whether realloc() quirks are suspected in any other case that's known). Part of what you demonstrated when you said that pystone didn't slow down when you fiddled stuff is that pystone also didn't speed up. I also don't know that the memcpy+free wormaround is actually going to help more than it hurts overall. Yes, in the socket-abuse case, where the program routinely malloc()s strings millions of bytes larger than the socket can deliver, it would obviously help. That's not typically program behavior (however typical it may be of that specific app). More typical is shrinking a long list one element at a time, in which case about half the list remaining would get memcpy'd from time to time where such copies never get made today.

I do not yet know of another specific case where Darwin's realloc() implementation causes a problem.


The list case would certainly be a loss with current behavior if the list gets extremely large at some point, but then becomes small and stays that way for a long period of time.

IOW, there's no straightforward pure win here.

Well at least we have a nice bug to report to Apple, whether or not we do something about it ourselves.


-bob

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to