Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Raymond Hettinger

On Oct 28, 2013, at 1:16 PM, Victor Stinner  wrote:

> so what is the
> status of the PEP 455 (TransformDict)?


I'm giving a thorough evaluation of the proposal
and am devoting chunks of time each weekend
to reviewing the email threads, the links provided
in the PEPs, looking at how well the TD fits in existing code.

I'm giving this work priority over my own list of things
to add to 3.4 (most of which will now have to wait until 3.5).

This week, I'm teaching a five-day intermediate python class
to highly experienced network engineers in Denver.  We'll do
some exercises using the TD and evaluate the results against
alternative approaches.

Here are some preliminary notes (in no particular order):

* A first reading of the python-dev threads suggests that
the two-dict TD implementation seems better suited to
implementing a case-folding-case-preserving dictionary
and is far less well suited for a case-folding dict or an
identity dict.

* There are interesting differences between the proposed TD
and the CaseInsensitiveDict implemented in Kenneth Reitz's
HTTP requests library.  The latter keeps the last key added
rather than the first.   It also has a cleaner implementation
and the API is a bit nicer (no getitem() method).

* The "originals" dict maps a transformed key back to its
first saved original value.  An alternative would be to map
back to a set of original values or a list of original values.

* A possible use case is for a unicode normalizing dictionary
where  'L' + chr(111) + chr(776) + 'wis'  would match
'L' + chr(246) + 'wis'.

* The implementation looks rough at this point, but that is easily
fixed-up.  I'll leave some specific suggestions on the tracker
(getting it to accept a list of tuples in the constructor, a recursive
repr, renaming the getitem() method, deprivatizing the attributes,
getting it to work with __missing__, etc).

* Having two-mappings-in-one seems to be the most awkward
part of the design and wouldn't be needed for the two strongest
use cases, a case-insensitive-but-not-case-preserving dict
and an identity dict.

* In http://stackoverflow.com/questions/13230414, the OP
wants a CI dict but doesn't need the case preserving aspect.
The OP is provided with a regular dictionary containing mixed case
keys and needs to compare a list of potential matches of unknown case. 
Copying the dict to a case-folding TD wouldn't provide any incremental
advantage over building a regular dict with lower case keys.

* In http://www.gossamer-threads.com/lists/python/python/209527,
the OP wanted an ID comparison for a symbolic calculation.
The objects already had an eq function and he wanted to temporarily
bypass that in a symbol lookup.  Essentially he needed a dictionary
that would allow the use of an alternative equality function.
A TD would work here but there would be no need for the 
key preserving feature.  There doesn't seem to be any advantage
over using a regular dict that directly stores the id() as the key.

* In https://mail.python.org/pipermail/python-ideas/2010-May/007235.html,
the OP wants a highly optimized identity dictionary that doesn't
call an expensive id() function.   The proposed TD doesn't fulfill
this request -- it would be far from being optimized and would
call id() on every store and every lookup.

* In http://msdn.microsoft.com/en-us/library/xfhwa508.aspx,
the API describes a dict with an alternative equality comparison.
This is a different design than the TD and is for a world that is
somewhat different from Python.  In that dict, eq and hash
are specified at the dict level rather than object level
(in Python, the key objects define their own __eq__ and __hash__
rather than having the user attach the functions directly to the dictionary).

* In http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html,
the IdentityHashMap is described as being for rare use cases
such as topology-preserving object graph transformations
likes serialization or deep-copying.  Looking at Python's own code
for copy.deepcopy(), the TD would not be a good replacement
for the existing code (more memory intensive, slower, no use
for the originals dict, and no use for most of the TD functionality).
It appears that it is simpler and faster to store and lookup d[id(obj)] than
to use a TD.

* If this were being modeled in a database, we would have one table
with a many-to-one mapping of original keys to transformed keys
and another table with a transformed key as the primary key in a
table of key-value pairs.   This suggests two lookup patterns
original->tranformed->value and transformed->all_originals.

* The Apache case insensitive dict documentation includes these
thoughts: "This map will violate the detail of various Map and map view 
contracts. As a
general rule, don't compare this map to other maps. In particular, you can't use
decorators like ListOrderedMap on it, which silently assume that these contracts
are fulfilled. --- Note that CaseInsensitiveMap is not synchronized an

Re: [Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

2013-10-30 Thread Victor Stinner
Hi,

2013/10/30 Jim J. Jewett :
> Well, unless I missed it... I don't see how to get anything beyond
> the return value of get_traces, which is a (time-ordered?) list
> of allocation size with then-current call stack.  It doesn't mention
> any attribute for indicating that some entries are de-allocations,
> let alone the actual address of each allocation.

get_traces() does return the traces of the currently allocated memory
blocks. It's not a log of alloc/dealloc calls. The list is not sorted.
If you want a sorted list, use take_snapshot.statistics('lineno') for
example.

> In that case, I would expect disabling (and filtering) to stop
> capturing new allocation events for me, but I would still expect
> tracemalloc to do proper internal maintenance.

tracemalloc has an important overhead in term of performances and
memory. The purpose of disable() is to... disable the module, to
remove complelty the overhead.

In practice, enable() installs on memory allocators, disable()
uninstalls these hooks.

I don't understand why you are so concerned by disable(). Why would
you like to keep traces and disable the module? I never called
disable() in my own tests, the module is automatically disabled at
exit.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

2013-10-30 Thread Victor Stinner
2013/10/30 Stephen J. Turnbull :
> Just "reset" implies to me that you're ready to start over.  Not just
> traced memory blocks but accumulated statistics and any configuration
> (such as Filters) would also be reset.  Also tracing would be disabled
> until started explicitly.

If the name is really the problem, I propose the restore the previous
name: clear_traces(). It's symmetric with get_traces(), like
add_filter()/get_filters()/clear_filters().


> Shouldn't disable() do this automatically, perhaps with an optional
> discard_traces flag (which would be False by default)?

The pattern is something like that:

enable()
snapshot1 = take_snapshot()
...
snapshot2 = take_snapshot()
disable()

I don't see why disable() would return data.


> But I definitely agree with Jim:  You *must* provide an example here
> showing how to save the traces (even though it's trivial to do so),
> because that will make clear that disable() is a destructive
> operation.  (It is not destructive in any other debugging tool that
> I've used.)  Even with documentation, be prepared for user complaints.

I added "Call get_traces() or take_snapshot() function to get traces
before clearing them." to the doc:

http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html#tracemalloc.disable

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-30 Thread Kristján Valur Jónsson


> -Original Message-
> From: Victor Stinner [mailto:victor.stin...@gmail.com]
> Sent: 29. október 2013 21:30
> To: Kristján Valur Jónsson
> Cc: python-dev
> Subject: Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
> tracemalloc maintains a dictionary of all allocated memory blocks, which is
> slow and eats a lot of memory. You don't need tracemalloc to log calls to
> malloc/realloc/free. You can write your own hook using the PEP 445 (malloc
> API). A code just writing to stderr should not be longer than 100 linues
> (tracemalloc is closer to 2500 lines).
> 

The point of a PEP is getting something into standard python.  The command line 
flag is also part of this.
Piggybacking a lightweight client/server data-gathering version of this on top 
of the PEP
could be beneficial in that respect. 

Unless I am mistaken, the Pep 445 hooks must be setup before calling 
Py_Initialize() and so using
them is not trivial.

Anyway, just a suggestion, for the record.

K
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-30 Thread Victor Stinner
2013/10/30 Kristján Valur Jónsson :
> The point of a PEP is getting something into standard python.  The command 
> line flag is also part of this.
> Piggybacking a lightweight client/server data-gathering version of this on 
> top of the PEP
> could be beneficial in that respect.

In my opinion, your use case (log malloc/free and sent it to the
network) is completly different to what tracemalloc does. Reusing
tracemalloc for you would be inefficient (slow, use too much memory).

You can use tracemalloc if you want to send a snapshot of traces every
N minutes. It should not be hard (less than 100 lines of Python) to
implement that using a thread, pickle and a socket.

But I prefer to not include it to the PEP, Charles François wants a
minimal module and prefers to develop tools on top of the module. I
*now* agree with him (first I wanted to pack everything into the
stdlib! even escape sequences to write text with colors!).

For example, the old code using "tasks" to take automatically a
snapshot every N minutes or display the top 10 allocations  every N
minutes into the terminal with colors has been moved to a new project:

   https://github.com/haypo/pytracemalloctext/blob/master/doc/index.rst

(the project is not usable yet, I will finish it after the PEP 454,
and after updating the pytracemalloc module on PyPI)

> Unless I am mistaken, the Pep 445 hooks must be setup before calling 
> Py_Initialize() and so using
> them is not trivial.

It depends on how you use the API. If you want to replace the memory
allocators (use your own "malloc"), you have to call
PyMem_SetAllocator() *before the first memory allocation* ! In Python,
the first memory allocation occurs much earlier than Py_Initialize(),
PyMem_RawMalloc() is the *first* instruction (!) executed by Python in
its main() function... (see Modules/python.c).

If you want to install an hook calling the previous allocator, you can
call PyMem_SetAllocator() anytime: that's why it's possible to call
tracemalloc.enable() anytime.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread R. David Murray
On Wed, 30 Oct 2013 01:12:03 -0600, Raymond Hettinger 
 wrote:
> If I had to choose right now, a safe choice would be to focus on
> the primary use case and implement a clean CaseInsensitiveDict
> without the double-dict first-saved case-preserving feature.
> That said, I find the TD to be fascinating and there's more work
> to do before making a decision.

Please be aware that the PEP author's motivation in submitting the PEP was
to have a case insensitive, case *preserving* dict.  The generalization
serves to make the new datatype more useful, but if the end result
doesn't satisfy the original use case of the author, I won't be
surprised if he has no motivation to work on it further :).

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread R. David Murray
On Wed, 30 Oct 2013 01:12:03 -0600, Raymond Hettinger 
 wrote:
> I'm giving a thorough evaluation of the proposal
> and am devoting chunks of time each weekend
> to reviewing the email threads, the links provided
> in the PEPs, looking at how well the TD fits in existing code.
> 
> I'm giving this work priority over my own list of things
> to add to 3.4 (most of which will now have to wait until 3.5).

And thanks for doing all this work, Raymond.  I forgot to say
that in my previous post.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Ethan Furman

On 10/30/2013 12:12 AM, Raymond Hettinger wrote:


Hopefully, this post will make the thought process more transparent.


Thanks, Raymond.  Your time is appreciated.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Nigel Small
It strikes me that there could be an alternative approach to some of the
use cases discussed here. Instead of a new type of dictionary, the
case-insensitivity problem could be solved with something akin to a *
CaseInsensitiveString* class used for keys within a standard dictionary.
This would be very similar to a normal string except with comparison and
hashing. It would mean that CaseInsensitiveString("Foo") is considered
equal to CaseInsensitiveString("foo") and allow code such as the following:

>>> headers = {}
>>> headers[CaseInsensitiveString("content-type")] = "text/plain"
>>> headers[CaseInsensitiveString("Content-Type")]
"text/plain"

This would obviously also be usable in other places where case-insensitive
strings are required.

Just my two pence/cents/other minor currency units.
Nigel


On 30 October 2013 14:18, Ethan Furman  wrote:

> On 10/30/2013 12:12 AM, Raymond Hettinger wrote:
>
>>
>> Hopefully, this post will make the thought process more transparent.
>>
>
> Thanks, Raymond.  Your time is appreciated.
>
> --
> ~Ethan~
>
> __**_
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/**mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/**mailman/options/python-dev/**
> nigel%40nigelsmall.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-30 Thread Victor Stinner
> Snapshot
> 
>
> ``Snapshot(timestamp: datetime.datetime, traceback_limit: int, stats:
> dict=None, traces: dict=None)`` class:
>
> Snapshot of statistics and traces of memory blocks allocated by
> Python.
>
> ``apply_filters(filters)`` method:
>
> Apply filters on the ``traces`` and ``stats`` dictionaries,
> *filters* is a list of ``Filter`` instances.

Snapshot.apply_filters() currently works in-place. This is not
convinient. It should create a new Snapshot instance.

For example, I have a huge snapshot with +800K traces. I would like to
ignore  and  filenames: I apply a
first filter to exclude <*>. Then I only want to see allocations
related to the regular expressions: I apply a second pair of filters
to only include */sre*.py and */re.py.

Ok, now I want to see other files. Uh oh, I loose all others traces, I
have to reload the huge snapshot. And again, exclude <*>.

I would prefer something like:

full_snapshot = Snapshot.load("huge.pickle")
clean = full_snapshot.apply_filters([Filter(False, "<*>")])
# delete maybe full_snapshot here
regex = clean.apply_filters([Filter(True, "*/re.py"), Filter(True,
"*/sre*.py")])
other = clean.apply_filters([Filter(False, "*/re.py"), Filter(False,
"*/sre*.py")])
...

> ``Filter(include: bool, filename_pattern: str, lineno: int=None,
> traceback: bool=False)`` class:
> ...
> ``traceback`` attribute:
>
>If *traceback* is ``True``, all frames of the traceback are checked.
>If *traceback* is ``False``, only the most recent frame is checked.
>
>This attribute is ignored if the traceback limit is less than ``2``.
>See the ``get_traceback_limit()`` function.

Hum, I don't really like the traceback name. traceback=False is
confusing because the traceback is used by the filter even if
traceback=False.

Other names: all_frames, any_frame, most_recent_frame_only, ...?

Example:

   f1 = Filter("*/linecache.py", all_frames=True)
   f2 = Filter("*/linecache.py")   # all_frames is False by default

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Ethan Furman

On 10/30/2013 09:34 AM, Nigel Small wrote:


It strikes me that there could be an alternative approach to some of the use 
cases discussed here. Instead of a new type
of dictionary, the case-insensitivity problem could be solved with something 
akin to a *CaseInsensitiveString* class [...]


The nice thing about the TransformDict is it is usable for much more than 
simple case-insensitivity.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 451 update

2013-10-30 Thread Eric Snow
On Tue, Oct 29, 2013 at 7:29 PM, Nick Coghlan  wrote:
> OK, time for me to stop trying to remember the details of the problem
> I'm trying to solve and go look them up in the source code :)
>
> One of my goals here is to be able to migrate extension loading from
> the old API to the new plugin API. That means being able to break up
> the existing load_module implementation:
>
> http://hg.python.org/cpython/file/1787277915e9/Python/importdl.c#l23
>
> For loading, that's a fairly straightforward create_module/exec_module
> split, but reloading gets a bit more interesting.
>
> Specifically, I'd like to be able to implement the relevant parts of
> _PyImport_FindExtensionObject as a precheck for reloading:
>
> http://hg.python.org/cpython/file/1787277915e9/Python/import.c#l533
>
> That means just having access to the module name isn't enough: the
> extensions dictionary is keyed by a (name, filename) 2-tuple rather
> than just by the module name. Using the find_spec API, that filename
> would be stored in the loader state on the spec object rather than
> being looked up anew during the load operation.
>
> However, rereading this method also indicates that I really want to
> know *in exec_module* whether this is a reload or not, since extension
> loading needs to handle reloads differently from initial execution.
>
> So I'm back to my original preference: I'd like the previous spec to
> be passed to exec_module in the reloading case. If reloading is not
> supported at all, it's up to the loader to throw an appropriate
> exception when the the previous spec is not None. If loading and
> reloading doesn't make any difference, then they can just ignore it.
> But when both are supported, but handled differently (as for extension
> modules), then that can be detected, and the information from the old
> spec (including the original loader and loader state) is available if
> needed.

Our recent discovery about reloading should probably be reflected in
the signature of finder.find_spec():

  MetaPathFinder.find_spec(name, path=None, existing=None)
  PathEntryFinder.find_spec(name, existing=None)

This way the finder has an opportunity to incorporate information from
an existing spec into the spec it returns.  reload() would make use of
this by passing module.__spec__ (or None if the module has no
__spec__) to _bootstrap._find_spec().

This approach should also address what you are looking for.  I'd
prefer it over passing the existing spec to exec_module().  The module
(and its __spec__) should have everything exec_module() needs to do
its job.

We would still need to use loader.supports_reload() in reload().
However, it may make more sense to pass in the module-to-be-reloaded
(after updating its __spec__ to the one found by
_bootstrap._find_spec()).

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Nigel Small
True, but I could similarly argue that the nice thing about
CaseInsensitiveString is it is usable for much more than dictionary keys -
it just depends on your point of view.

There would be nothing stopping other types of dictionary key
transformation being covered by other key data types in a similar way, I'm
simply trying to raise the question of where the genericity could sit: in
the dictionary or in the key.

Nigel


On 30 October 2013 17:04, Ethan Furman  wrote:

> On 10/30/2013 09:34 AM, Nigel Small wrote:
>
>>
>> It strikes me that there could be an alternative approach to some of the
>> use cases discussed here. Instead of a new type
>> of dictionary, the case-insensitivity problem could be solved with
>> something akin to a *CaseInsensitiveString* class [...]
>>
>
> The nice thing about the TransformDict is it is usable for much more than
> simple case-insensitivity.
>
>
> --
> ~Ethan~
> __**_
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/**mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/**mailman/options/python-dev/**
> nigel%40nigelsmall.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Antoine Pitrou
On Wed, 30 Oct 2013 16:34:33 +
Nigel Small  wrote:
> It strikes me that there could be an alternative approach to some of the
> use cases discussed here. Instead of a new type of dictionary, the
> case-insensitivity problem could be solved with something akin to a *
> CaseInsensitiveString* class used for keys within a standard dictionary.
> This would be very similar to a normal string except with comparison and
> hashing. It would mean that CaseInsensitiveString("Foo") is considered
> equal to CaseInsensitiveString("foo") and allow code such as the following:

And how does a case-insensitive string compare with a normal
(case-sensitive) string? This is a can of worms.

(if you answer, please don't answer in this thread but open a separate
one for case-insensitive strings, thanks)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Skip Montanaro
> And how does a case-insensitive string compare with a normal
> (case-sensitive) string? This is a can of worms.

I was wondering this myself. I suspect it would depend which string is
on the left hand side of the comparison operator, yes? Can of worms,
indeed.

implicit-insensitve-i-n-ly, y'rs,

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 455: TransformDict

2013-10-30 Thread Antoine Pitrou

Hi Raymond,

On Wed, 30 Oct 2013 01:12:03 -0600
Raymond Hettinger  wrote:
> 
> On Oct 28, 2013, at 1:16 PM, Victor Stinner  wrote:
> 
> > so what is the
> > status of the PEP 455 (TransformDict)?
> 
> 
> I'm giving a thorough evaluation of the proposal
> and am devoting chunks of time each weekend
> to reviewing the email threads, the links provided
> in the PEPs, looking at how well the TD fits in existing code.

Thanks for the thorough status report.

> * There are interesting differences between the proposed TD
> and the CaseInsensitiveDict implemented in Kenneth Reitz's
> HTTP requests library.  The latter keeps the last key added
> rather than the first.   It also has a cleaner implementation
> and the API is a bit nicer (no getitem() method).

First-vs-last has already been discussed in the previous thread. My
initial hunch was to keep the last key, but other people made the point
that first was both more compliant (with current dict behaviour) and
more useful (since you can override it by deleting and then reinserting
the entry).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

2013-10-30 Thread Jim Jewett
On Wed, Oct 30, 2013 at 6:02 AM, Victor Stinner
 wrote:
> 2013/10/30 Jim J. Jewett :
>> Well, unless I missed it... I don't see how to get anything beyond
>> the return value of get_traces, which is a (time-ordered?) list
>> of allocation size with then-current call stack.  It doesn't mention
>> any attribute for indicating that some entries are de-allocations,
>> let alone the actual address of each allocation.


> get_traces() does return the traces of the currently allocated memory
> blocks. It's not a log of alloc/dealloc calls. The list is not sorted.
> If you want a sorted list, use take_snapshot.statistics('lineno') for
> example.

Any list is sorted somehow; I had assumed that it was defaulting to
order-of-creation, though if you use a dict internally, that might not
be the case.  If you return it as a list instead of a dict, but that list is
NOT in time-order, that is worth documenting

Also, am I misreading the documentation of get_traces() function?

Get traces of memory blocks allocated by Python.
Return a list of (size: int, traceback: tuple) tuples.
traceback is a tuple of (filename: str, lineno: int) tuples.


So it now sounds like you don't bother to emit de-allocation
events because you just remove the allocation from your
internal data structure.

In other words, you provide a snapshot, but not a history --
except that the snapshot isn't complete either, because it
only shows things that appeared after a certain event
(the most recent enablement).

I still don't see anything here(*) that requires even saving
the address, let alone preventing re-use.

(*) get_object_traceback(obj) might require a stored
 address for efficiency, but the base functionality of
getting traces doesn't.

I still wouldn't worry about address re-use though,
because the address should not be re-used until
the object has been deleted -- and is no longer
available to be passed to get_object_traceback.
So the worst that can happen is that an object which
was not traced might return a bogus answer
instead of failing.

>> In that case, I would expect disabling (and filtering) to stop
>> capturing new allocation events for me, but I would still expect
>> tracemalloc to do proper internal maintenance.

> tracemalloc has an important overhead in term of performances and
> memory. The purpose of disable() is to... disable the module, to
> remove completely the overhead.
> ...  Why would you like to keep traces and disable the module?

Because of that very overhead.  I think my use typical use case would
be similar to Kristján Valur's, but I'll try to spell it out in more
detail here.

(1)  Whoa -- memory hog!  How can I fix this?

(2)  I know -- track all allocations, with a traceback showing why they
were made.  (At a minimum, I would like to be able to subclass your
tool to do this -- preferably without also keeping the full history in
memory.)

(3)  Oh, maybe I should skip the ones that really are temporary and
get cleaned up.  (You make this easy by handling the de-allocs,
though I'm not sure those events get exposed to anyone working at
the python level, as opposed to modifying and re-compiling.)

(4)  hmm... still too big ... I should use filters.  (But will changing those
filters while tracing is enabled mess up your current implementation?)

(5)  Argh.  What I really want is to know what gets allocated at times
like XXX.
I can do that if times-like-XXX only ever occur once per process.  I *might* be
able to do it with filters.  But I would rather do it by saying "trace on" and
"trace off".   Maybe even with a context manager around the suspicious
places.

(6)  Then, at the end of the run, I would say "give me the info about how much
was allocated when tracing was on."  Some of that might be going away
again when tracing is off, but at least I know what is making the allocations
in the first place.  And I know that they're sticking around "long enough".

Under your current proposal, step (5) turns into

set filters
trace on
...
get_traces
serialize to some other storage
trace off

 and step (6) turns into
read in from that other storage I just made up on the fly, and do my own
summarizing, because my format is almost by definition non-standard.

This complication isn't intolerable, but neither is it what I expect
from python.
And it certainly isn't what I expect from a binary toggle like enable/disable.
(So yes, changing the name to clear_traces would help, because I would
still be disappointed, but at least I wouldn't be surprised.)

Also, if you do stick with the current limitations, then why even have
get_traces,
as opposed to just take_snapshot?  Is there some difference between them,
except that a snapshot has some convenience methods and some simple
metadata?

Later, he wrote:
> I don't see why disable() would return data.

disable is indeed a bad name for something that returns data.

The only reason to return data from "disable" i

Re: [Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

2013-10-30 Thread Victor Stinner
Le 30 oct. 2013 20:58, "Jim Jewett"  a écrit :
> hough if you use a dict internally, that might not
> be the case.

Tracemalloc uses a {address: trace} duct internally.

>  If you return it as a list instead of a dict, but that list is
> NOT in time-order, that is worth documenting

Ok i will document it.

> Also, am I misreading the documentation of get_traces() function?
>
> Get traces of memory blocks allocated by Python.
> Return a list of (size: int, traceback: tuple) tuples.
> traceback is a tuple of (filename: str, lineno: int) tuples.
>
>
> So it now sounds like you don't bother to emit de-allocation
> events because you just remove the allocation from your
> internal data structure.

I don't understand your question. Tracemalloc does not store events but
traces. When a memory block is deallocated, it us removed from the internal
dict (and so from get_traces() list).

> I still don't see anything here(*) that requires even saving
> the address, let alone preventing re-use.

The address must be stored internally to maintain the internal dict. See
the C code.

> (1)  Whoa -- memory hog!  How can I fix this?
>
> (2)  I know -- track allocallocations, with a traceback showing why they
> were made.  (At a minimum, I would like to be able to subclass your
> tool to do this -- preferably without also keeping the full history in
> memory.)

What do you mean by "full history" and "subclass your tool"?

> (3)  Oh, maybe I should skip the ones that really are temporary and
> get cleaned up.  (You make this easy by handling the de-allocs,
> though I'm not sure those events get exposed to anyone working at
> the python level, as opposed to modifying and re-compiling.)

If your temporary objects are destroyed before you call get_traces(), you
will not see them in get_traces(). I don't understand.

> (4)  hmm... still too big ... I should use filters.  (But will changing
those
> filters while tracing is enabled mess up your current implementation?)

If you call add_filter(), new traces() will be filtered. Not the old ones,
as explained in the doc. What do you mean by "mess up"?

> (5)  Argh.  What I really want is to know what gets allocated at times
> like XXX.
> I can do that if times-like-XXX only ever occur once per process.  I
*might* be
> able to do it with filters.  But I would rather do it by saying "trace
on" and
> "trace off".   Maybe even with a context manager around the suspicious
> places.

I don't understand "times like XXX", what is it?

To see what happened between two lines of code, you can compare two
snapshots. No need to disable tracing.

> (6)  Then, at the end of the run, I would say "give me the info about how
much
> was allocated when tracing was on."  Some of that might be going away
> again when tracing is off, but at least I know what is making the
allocations
> in the first place.  And I know that they're sticking around "long
enough".

I think you musunderstood how tracemalloc works. You should compile it and
play with it. In my opinion, you already have everything in tracemalloc for
you scenario.

> Under your current proposal, step (5) turns into
>
> set filters
> trace on
> ...
> get_traces
> serialize to some other storage
> trace off

s1=take_snapshot()
...
s2=take_snapshot()
...
diff=s2.statistics("lines", compare_to=s1)

> why even have
> get_traces,
> as opposed to just take_snapshot?  Is there some difference between them,
> except that a snapshot has some convenience methods and some simple
> metadata?

See the doc: Snapshot.traces is the result of get_traces().

get_traces() is here is you want to write your own tool without Snapshot.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 451 update

2013-10-30 Thread Nick Coghlan
On 31 Oct 2013 03:41, "Eric Snow"  wrote:
>
> On Tue, Oct 29, 2013 at 7:29 PM, Nick Coghlan  wrote:
> > OK, time for me to stop trying to remember the details of the problem
> > I'm trying to solve and go look them up in the source code :)
> >
> > One of my goals here is to be able to migrate extension loading from
> > the old API to the new plugin API. That means being able to break up
> > the existing load_module implementation:
> >
> > http://hg.python.org/cpython/file/1787277915e9/Python/importdl.c#l23
> >
> > For loading, that's a fairly straightforward create_module/exec_module
> > split, but reloading gets a bit more interesting.
> >
> > Specifically, I'd like to be able to implement the relevant parts of
> > _PyImport_FindExtensionObject as a precheck for reloading:
> >
> > http://hg.python.org/cpython/file/1787277915e9/Python/import.c#l533
> >
> > That means just having access to the module name isn't enough: the
> > extensions dictionary is keyed by a (name, filename) 2-tuple rather
> > than just by the module name. Using the find_spec API, that filename
> > would be stored in the loader state on the spec object rather than
> > being looked up anew during the load operation.
> >
> > However, rereading this method also indicates that I really want to
> > know *in exec_module* whether this is a reload or not, since extension
> > loading needs to handle reloads differently from initial execution.
> >
> > So I'm back to my original preference: I'd like the previous spec to
> > be passed to exec_module in the reloading case. If reloading is not
> > supported at all, it's up to the loader to throw an appropriate
> > exception when the the previous spec is not None. If loading and
> > reloading doesn't make any difference, then they can just ignore it.
> > But when both are supported, but handled differently (as for extension
> > modules), then that can be detected, and the information from the old
> > spec (including the original loader and loader state) is available if
> > needed.
>
> Our recent discovery about reloading should probably be reflected in
> the signature of finder.find_spec():
>
>   MetaPathFinder.find_spec(name, path=None, existing=None)
>   PathEntryFinder.find_spec(name, existing=None)
>
> This way the finder has an opportunity to incorporate information from
> an existing spec into the spec it returns.  reload() would make use of
> this by passing module.__spec__ (or None if the module has no
> __spec__) to _bootstrap._find_spec().
>
> This approach should also address what you are looking for.  I'd
> prefer it over passing the existing spec to exec_module().  The module
> (and its __spec__) should have everything exec_module() needs to do
> its job.

Yes, that should work.

> We would still need to use loader.supports_reload() in reload().

Why? If the reload isn't supported, exec_module can just throw an exception
based on the loader state in the spec.

>From the import system's point of view "reload not permitted" is no
different from any other exec time failure.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 451 update

2013-10-30 Thread Eric Snow
On Wed, Oct 30, 2013 at 4:09 PM, Nick Coghlan  wrote:
> On 31 Oct 2013 03:41, "Eric Snow"  wrote:
>> Our recent discovery about reloading should probably be reflected in
>> the signature of finder.find_spec():
>>
>>   MetaPathFinder.find_spec(name, path=None, existing=None)
>>   PathEntryFinder.find_spec(name, existing=None)
>>
>> This way the finder has an opportunity to incorporate information from
>> an existing spec into the spec it returns.  reload() would make use of
>> this by passing module.__spec__ (or None if the module has no
>> __spec__) to _bootstrap._find_spec().
>>
>> This approach should also address what you are looking for.  I'd
>> prefer it over passing the existing spec to exec_module().  The module
>> (and its __spec__) should have everything exec_module() needs to do
>> its job.
>
> Yes, that should work.

Cool.  I'll update the PEP.

>
>> We would still need to use loader.supports_reload() in reload().
>
> Why? If the reload isn't supported, exec_module can just throw an exception
> based on the loader state in the spec.

At the point that exec_module() gets called, the loader can't check
sys.modules to see if it's a reload or not.  As a workaround, the
finder could set up some loader state to indicate to the loader that
it's a reload and then the loader, during exec_module(), would check
that and act accordingly.  However, that's the sort of boilerplate
that PEP 451 is trying to offload onto the import machinery.  With
Loader.supports_reload() it's a lot cleaner.

-eric

>
> From the import system's point of view "reload not permitted" is no
> different from any other exec time failure.
>
> Cheers,
> Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-30 Thread Victor Stinner
New update of the PEP combining various remarks:

* Remove GroupedStats class and Snapshot.group_by(): replaced with a
new Snapshot.statistics() method which combines all features
* Rename reset() to clear_traces() and explain how to get traces
before clearing traces
* Snapshot.apply_filters() now returns a new Snapshot instance
* Rename Filter.include to Filter.inclusive
* Rename Filter.traceback to Filter.all_frames
* Add a section "Log calls to the memory allocator"

Thanks Jim, Charles-François and Kristjan for your feedback!


Here is the new section, tell me if it sounds good. I didn't implement
logging just to compare performances. "slower than" is just a previous
experience with a very basic logging code. Tell me if you disagree.

@Kristjan: I understood that you implemented a tool to log calls on a
Playstation3 and send them over the network. How do you process so
much data (I computed 29 GB/hour)? Do you log all calls, or only a few
of them?


Rejected Alternatives
=

Log calls to the memory allocator
-

A different approach is to log calls to ``malloc()``, ``realloc()`` and
``free()`` functions. Calls can be logged into a file or send to another
computer through the network. Example of a log entry: name of the
function, size of the memory block, address of the memory block, Python
traceback where the allocation occurred, timestamp.

Logs cannot be used directly, getting the current status of the memory
requires to parse previous logs. For example, it is not possible to get
directly the traceback of a Python object, like
``get_object_traceback(obj)`` does with traces.

Python uses objects with a very short lifetime and so makes an extensive
use of memory allocators. It has an allocator optimized for small
objects (less than 512 bytes) with a short lifetime.  For example, the
Python test suites calls ``malloc()``, ``realloc()`` or ``free()``
270,000 times per second in average. If the size of log entry is 32
bytes, logging produces 8.2 MB per second or 29.0 GB per hour.

The alternative was rejected because it is less efficient and has less
features. Parsing logs in a different process or a different computer is
slower than maintaining traces on allocated memory blocks in the same
process.


Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-30 Thread Victor Stinner
2013/10/31 Victor Stinner :
> Log calls to the memory allocator
> -
>
> A different approach is to log calls to ``malloc()``, ``realloc()`` and
> ``free()`` functions. Calls can be logged into a file or send to another
> computer through the network. Example of a log entry: name of the
> function, size of the memory block, address of the memory block, Python
> traceback where the allocation occurred, timestamp.
>
> Logs cannot be used directly, getting the current status of the memory
> requires to parse previous logs. For example, it is not possible to get
> directly the traceback of a Python object, like
> ``get_object_traceback(obj)`` does with traces.
>
> Python uses objects with a very short lifetime and so makes an extensive
> use of memory allocators. It has an allocator optimized for small
> objects (less than 512 bytes) with a short lifetime.  For example, the
> Python test suites calls ``malloc()``, ``realloc()`` or ``free()``
> 270,000 times per second in average. If the size of log entry is 32
> bytes, logging produces 8.2 MB per second or 29.0 GB per hour.
>
> The alternative was rejected because it is less efficient and has less
> features. Parsing logs in a different process or a different computer is
> slower than maintaining traces on allocated memory blocks in the same
> process.

"less features": get_object_traceback(obj), get_traces() and
Snapshot.statistics() can be computed from the log, but you have to
process a lot of data.

How much time does it take to compute statistics on 1 hour of logs?
And for 1 week of logs? With tracemalloc you get these information in
a few seconds (immediatly for get_object_traceback().

It should be possible to compute statistics every N minutes and store
the result to not have to parse the whole log file at once.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 451 update

2013-10-30 Thread Nick Coghlan
On 31 Oct 2013 08:54, "Eric Snow"  wrote:
>
> On Wed, Oct 30, 2013 at 4:09 PM, Nick Coghlan  wrote:
> > On 31 Oct 2013 03:41, "Eric Snow"  wrote:
> >> Our recent discovery about reloading should probably be reflected in
> >> the signature of finder.find_spec():
> >>
> >>   MetaPathFinder.find_spec(name, path=None, existing=None)
> >>   PathEntryFinder.find_spec(name, existing=None)
> >>
> >> This way the finder has an opportunity to incorporate information from
> >> an existing spec into the spec it returns.  reload() would make use of
> >> this by passing module.__spec__ (or None if the module has no
> >> __spec__) to _bootstrap._find_spec().
> >>
> >> This approach should also address what you are looking for.  I'd
> >> prefer it over passing the existing spec to exec_module().  The module
> >> (and its __spec__) should have everything exec_module() needs to do
> >> its job.
> >
> > Yes, that should work.
>
> Cool.  I'll update the PEP.
>
> >
> >> We would still need to use loader.supports_reload() in reload().
> >
> > Why? If the reload isn't supported, exec_module can just throw an
exception
> > based on the loader state in the spec.
>
> At the point that exec_module() gets called, the loader can't check
> sys.modules to see if it's a reload or not.  As a workaround, the
> finder could set up some loader state to indicate to the loader that
> it's a reload and then the loader, during exec_module(), would check
> that and act accordingly.  However, that's the sort of boilerplate
> that PEP 451 is trying to offload onto the import machinery.  With
> Loader.supports_reload() it's a lot cleaner.

There's also the option of implementing the constraint directly in the
finder, which *does* have the necessary info (with the change to pass the
previous spec to find_spec).

I still think it makes more sense to leave this out for the moment - it's
not at all clear we need the extra method, and adding it later would be a
straightforward protocol update.

Cheers,
Nick.

>
> -eric
>
> >
> > From the import system's point of view "reload not permitted" is no
> > different from any other exec time failure.
> >
> > Cheers,
> > Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 454 (tracemalloc) disable ==> clear?

2013-10-30 Thread Stephen J. Turnbull
Jim Jewett writes:

 > Later, he wrote:
 > > I don't see why disable() would return data.
 > 
 > disable is indeed a bad name for something that returns data.

Note that I never proposed that disable() *return* anything, only that
it *get* the trace.  It could store it in some specified object, or a
file, rather than return it, for example.  I deliberately left what it
does with the retrieved data unspecified.  The important thing to me
is that it not be dropped on the floor by something named "disable".
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com