On Jan 3, 2005, at 12:13 AM, Tim Peters wrote:

[Bob Ippolito]
Quite a few notable places in the Python sources expect realloc(...) to
relinquish some memory if the requested size is smaller than the
currently allocated size.

I don't know what "relinquish some memory" means. If it means something like "returns memory to the OS, so that the reported process size shrinks", then no, nothing in Python ever assumes that. That's simply because "returns memory to the OS" and "process size" aren't concepts in the C standard, and so nothing can be said about them in general -- not in theory, and neither in practice, because platforms (OS+libc combos) vary so widely in behavior here.

As a pragmatic matter, I *expect* that a production-quality realloc()
implementation will at least be able to reuse released memory,
provided that the amount released is at least half the amount
originally malloc()'ed (and, e.g., reasonable buddy systems may not be
able to do better than that).

This is what I meant by relinquish (c/o merriam-webster):
a : to stop holding physically : RELEASE <slowly relinquished his grip on the bar>
b : to give over possession or control of : YIELD <few leaders willingly relinquish power>


Your expectation is not correct for Darwin's memory allocation scheme. It seems that Darwin creates allocations of immutable size. The only way ANY part of an allocation will ever be used by ANYTHING else is if free() is called with that allocation. free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. In that case, it will create a new allocation of at least the requested size, copy the contents of the original allocation into the new allocation (probably with copy-on-write pages if it's large enough, so it might be cheap), and free() the allocation. In the case where realloc() specifies a size that is not greater than the allocation's size, it will simply return the given allocation and cause no side-effects whatsoever.

Was this a good decision? Probably not! However, it is our (in the "I know you use Windows but I am not the only one that uses Mac OS X" sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any "fix" to the allocator unless we can prove it has some security implications in software shipped with their OS. I attempted to look for some easy ones by performing a quick audit of Apache, OpenSSH, and OpenSSL. Unfortunately, their developers did not share your expectation. I found one sprintf-like routine in Apache that could be affected by this behavior, and one instance of immutable string creation in Apple's CoreFoundation CFString implementation, but I have yet to find an easy way to exploit this behavior from the outside. I should probably be looking at PHP and Perl instead ;)

but I figure Darwin does this as an "optimization" and because Darwin
probably can't resize mmap'ed memory (at least it can't from Python,
but this probably means it doesn't have this capability at all).

It is possible to "fix" this for Darwin,

I don't understand what's "broken". Small objects go thru Python's own allocator, which has its own realloc policies and its own peculiarities (chiefly that pymalloc never free()s any memory allocated for small objects).

What's broken is that there are several places in Python that seem to assume that you can allocate a large chunk of memory, and make it smaller in some meaningful way with realloc(...). This is not true with Darwin. You are right about small objects. They don't matter because they're small, and because they're handled by Python's allocator.


because you can ask the default malloc zone how big a particular
allocation is, and how big an allocation of a given size will actually
be (see: <malloc/malloc.h>).
The obvious place to put this would be PyObject_Realloc, because this
is at least called by _PyString_Resize (which will fix
<http://python.org/sf/1092502>).

The diagnosis in the bug report seems to leave it pointing at socket.py's _fileobject.read(), although I suspect the real cause is in socketmodule.c's sock_recv(). We've had other reports of various problems when people pass absurdly large values to socket recv(). A better fix here would probably amount to rewriting sock_recv() to refuse to pass enormous numbers to the platform recv() (it appears that many platform recv() implementations simply don't expect a recv() argument to be much bigger than the native network buffer size, and screw up when that's not so).

You are correct. The real cause is in sock_recv(), and/or _PyString_Resize(), depending on how you look at it.


Note that all versions of Darwin that I've looked at (6.x, 7.x, and
8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have
this "issue", but it might go away by Mac OS X 10.4 or some later
release.

It would be good to rewrite sock_recv() more defensively in any case. Best I can tell, this implementation of realloc() is standard-conforming but uniquely brain dead in its downsize behavior.

Presumably this can happen at other places (including third party extensions), so a better place to do this might be _PyString_Resize(). list_resize() is another reasonable place to put this. I'm sure there are other places that use realloc() too, and the majority of them do this through obmalloc. So maybe instead of trying to track down all the places where this can manifest, we should just "gunk up" Python and patch PyObject_Realloc()? Since we are both pretty confident that other allocators aren't like Darwin, this "gunk" can be #ifdef'ed to the __APPLE__ case.


I don't expect the latter will last (as you say on your page,
"probably plenty of other software" also makes the same pragmatic
assumptions about realloc downsize behavior), so I'm not keen to gunk
up Python to worm around it.

As I said above, I haven't yet found any other software that makes the same kind of realloc() assumptions that Python does. I'm sure I'll find something, but what's important to me is that Python works well on Mac OS X, so something should happen. If we can't prove that Apple's allocation strategy is a security flaw in some service that ships with the OS, any improvements to this strategy are very unlikely to be backported to current versions of Mac OS X.


-bob

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to