Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version

Nick Coghlan Fri, 18 Oct 2013 17:51:43 -0700

On 19 Oct 2013 03:57, "Charles-François Natali" <cf.nat...@gmail.com> wrote:
>
> Hi,
>
> I'm happy to see this move forward!


Speaking of which... Charles-François, would you be willing to act as
BDFL-Delegate for this PEP? This will be a very useful new analysis tool,
and between yourself and Victor it looks like you'll be able to come up
with a solid API.

I just suggested that approach to Guido and he also liked the idea :)

Cheers,
Nick.

>
> > API
> > ===
> >
> > Main Functions
> > --------------
> >
> > ``clear_traces()`` function:
> >
> >     Clear traces and statistics on Python memory allocations, and reset
> >     the ``get_traced_memory()`` counter.
>
> That's nitpicking, but how about just ``reset()`` (I'm probably biased
> by oprofile's opcontrol --reset)?
>
> > ``get_stats()`` function:
> >
> >     Get statistics on traced Python memory blocks as a dictionary
> >     ``{filename (str): {line_number (int): stats}}`` where *stats* in a
> >     ``(size: int, count: int)`` tuple, *filename* and *line_number* can
> >     be ``None``.
>
> It's probably obvious, but you might want to say once what *size* and
> *count* represent (and the unit for *size*).
>
> > ``get_tracemalloc_memory()`` function:
> >
> >     Get the memory usage in bytes of the ``tracemalloc`` module as a
> >     tuple: ``(size: int, free: int)``.
> >
> >     * *size*: total size of bytes allocated by the module,
> >       including *free* bytes
> >     * *free*: number of free bytes available to store data
>
> What's *free* exactly? I assume it's linked to the internal storage
> area used by tracemalloc itself, but that's not clear at all.
>
> Also, is the tracemalloc overhead included in the above stats (I'm
> mainly thinking about get_stats() and get_traced_memory()?
> If yes, I find it somewhat confusing: for example, AFAICT, valgrind's
> memcheck doesn't report the memory overhead, although it can be quite
> large, simply because it's not interesting.
>
> > Trace Functions
> > ---------------
> >
> > ``get_traceback_limit()`` function:
> >
> >     Get the maximum number of frames stored in the traceback of a trace
> >     of a memory block.
> >
> >     Use the ``set_traceback_limit()`` function to change the limit.
>
> I didn't see anywhere the default value for this setting: it would be
> nice to write it somewhere, and also explain the rationale (memory/CPU
> overhead...).
>
> > ``get_object_address(obj)`` function:
> >
> >     Get the address of the main memory block of the specified Python
object.
> >
> >     A Python object can be composed by multiple memory blocks, the
> >     function only returns the address of the main memory block.
>
> IOW, this should return the same as id() on CPython? If yes, it could
> be an interesting note.
>
> > ``get_object_trace(obj)`` function:
> >
> >     Get the trace of a Python object *obj* as a ``(size: int,
> >     traceback)`` tuple where *traceback* is a tuple of ``(filename: str,
> >     lineno: int)`` tuples, *filename* and *lineno* can be ``None``.
>
> I find the "trace" word confusing, so it might be interesting to add a
> note somewhere explaining what it is ("callstack leading to the object
> allocation", or whatever).
>
> Also, this function leaves me a mixed feeling: it's called
> get_object_trace(), but you also return the object size - well, a
> vague estimate thereof. I wonder if the size really belongs here,
> especially if the information returned isn't really accurate: it will
> be for an integer, but not for e.g. a list, right? How about just
> using sys.getsizeof(), which would give a more accurate result?
>
> > ``get_trace(address)`` function:
> >
> >     Get the trace of a memory block as a ``(size: int, traceback)``
> >     tuple where *traceback* is a tuple of ``(filename: str, lineno:
> >     int)`` tuples, *filename* and *lineno* can be ``None``.
> >
> >     Return ``None`` if the ``tracemalloc`` module did not trace the
> >     allocation of the memory block.
> >
> >     See also ``get_object_trace()``, ``get_stats()`` and
> >     ``get_traces()`` functions.
>
> Do you have example use cases where you want to work with a raw addresses?
>
> > Filter
> > ------
> >
> > ``Filter(include: bool, pattern: str, lineno: int=None, traceback:
> > bool=False)`` class:
> >
> >     Filter to select which memory allocations are traced. Filters can be
> >     used to reduce the memory usage of the ``tracemalloc`` module, which
> >     can be read using the ``get_tracemalloc_memory()`` function.
> >
> > ``match(filename: str, lineno: int)`` method:
> >
> >     Return ``True`` if the filter matchs the filename and line number,
> >     ``False`` otherwise.
> >
> > ``match_filename(filename: str)`` method:
> >
> >     Return ``True`` if the filter matchs the filename, ``False``
otherwise.
> >
> > ``match_lineno(lineno: int)`` method:
> >
> >     Return ``True`` if the filter matchs the line number, ``False``
> >     otherwise.
> >
> > ``match_traceback(traceback)`` method:
> >
> >     Return ``True`` if the filter matchs the *traceback*, ``False``
> >     otherwise.
> >
> >     *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples.
>
> Are those ``match`` methods really necessary for the end user, i.e.
> are they worth being exposed as part of the public API?
>
> > StatsDiff
> > ---------
> >
> > ``StatsDiff(differences, old_stats, new_stats)`` class:
> >
> >     Differences between two ``GroupedStats`` instances.
> >
> >     The ``GroupedStats.compare_to()`` method creates a ``StatsDiff``
> >     instance.
> >
> > ``sort()`` method:
> >
> >     Sort the ``differences`` list from the biggest difference to the
> >     smallest difference. Sort by ``abs(size_diff)``, *size*,
> >     ``abs(count_diff)``, *count* and then by *key*.
> >
> > ``differences`` attribute:
> >
> >     Differences between ``old_stats`` and ``new_stats`` as a list of
> >     ``(size_diff, size, count_diff, count, key)`` tuples. *size_diff*,
> >     *size*, *count_diff* and *count* are ``int``. The key type depends
> >     on the ``GroupedStats.group_by`` attribute of ``new_stats``: see the
> >     ``Snapshot.top_by()`` method.
> >
> > ``old_stats`` attribute:
> >
> >     Old ``GroupedStats`` instance, can be ``None``.
> >
> > ``new_stats`` attribute:
> >
> >     New ``GroupedStats`` instance.
>
> Why keep references to ``old_stats`` and ``new_stats``?
> datetime.timedelta doesn't keep references to the date objects it was
> computed from.
>
> Also, if you sort the difference by default (which is a sensible
> choice), then the StatsDiff becomes pretty much useless, since you
> would just keep its ``differences`` attribute (sorted).
>
> > Snapshot
> > --------
> >
> > ``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats:
> > dict=None)`` class:
> >
> >     Snapshot of traces and statistics on memory blocks allocated by
Python.
>
>
> I'm confused.
> Why are get_trace(), get_object_trace(), get_stats() etc not methods
> of a Snapshot object?
> Is it because you don't store all the necessary information in a
> snapshot, or are they just some sort of shorthands, like:
> stats = get_stats()
> vs
> snapshot = Snapshot.create()
> stats = snapshot.stats
>
> > ``write(filename)`` method:
> >
> >     Write the snapshot into a file.
>
> I assume it's in a serialized form, only readable by Snapshort.load() ?
> BTW, it's a nitpick and debatable, but write()/read() or load()/dump()
> would be more consistent (see e.g. pickle's load/dump).
>
> > Metric
> > ------
> >
> > ``Metric(name: str, value: int, format: str)`` class:
> >
> >     Value of a metric when a snapshot is created.
>
> Alright, what's a metric again ;-) ?
>
> I don't know if it's customary, but having short examples would IMO be
nice.
>
> cf
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version

Reply via email to