On Fri, Apr 3, 2020 at 9:20 AM Paul Sokolovsky <pmis...@gmail.com> wrote:
> But not exactly. Let me humbly explain what's really a cost. It's
> looking at PyObject_HEAD
> https://swenson.github.io/python-xr/Include/object.h.html#line-78
> (damn, that's Python2 source, stupid google), and seeing that it's at
> least:
>
>     Py_ssize_t ob_refcnt;               \
>     struct _typeobject *ob_type;
>
> That's 2 word-sized fields, 16 bytes on 64-bit machine. You can dig further
> and further, and understand, how much memory it takes to store so-and-so
> kind of structure (and how it could be done differently).

That's fair, but the PyObject* header isn't the only cost. The actual
data for a Python string isn't stored in the structure. How do you
know how much memory is being consumed by that? Are you 100% certain
that sys.getsizeof() is measuring that? It appears from the source
code that it *probably* is (str.__sizeof__ is defined in
unicodeobject.c), but it counts, for instance, the length of the UTF-8
representation (if present) plus one for null termination, and that's
quite possibly not the actual allocated size, due to overhead (and
possible alignment) in PyObject_REALLOC.

So you have to either try to delve into the source and find every
single byte of overhead or wastage.... or you just allocate a huge
bunch of strings and then ask your OS how much space you're consuming.
Yes, the OS is going to have very coarse granularity, but when you're
trying to figure out the RAM requirements of large-string
concatenations, you're looking for a large difference anyway.

> Now a couple of words about RSS. That's R there for a reason, you should
> wonder what's if it's not "R". And modern OSes are very modern and
> nobody knows what they do with virtual memory, or at least they can't
> fix bugs when something should be "R", but actually "V" - for decades:
> https://bugzilla.kernel.org/show_bug.cgi?id=12309 (damn, now
> self-isolated from spam).
>
> I hope, the idea is clear: RSS is largely outside of your control, but
> bytes you allocate in your source are (or should be).
>

Technically yes, it's under your control. In practice, I'm not so sure.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/MGGOL2GZSZBFYXLNOOZ2NG6EV4NKIKNU/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to