On Oct 28, 2013, at 1:16 PM, Victor Stinner <victor.stin...@gmail.com> wrote:
> so what is the > status of the PEP 455 (TransformDict)? I'm giving a thorough evaluation of the proposal and am devoting chunks of time each weekend to reviewing the email threads, the links provided in the PEPs, looking at how well the TD fits in existing code. I'm giving this work priority over my own list of things to add to 3.4 (most of which will now have to wait until 3.5). This week, I'm teaching a five-day intermediate python class to highly experienced network engineers in Denver. We'll do some exercises using the TD and evaluate the results against alternative approaches. Here are some preliminary notes (in no particular order): * A first reading of the python-dev threads suggests that the two-dict TD implementation seems better suited to implementing a case-folding-case-preserving dictionary and is far less well suited for a case-folding dict or an identity dict. * There are interesting differences between the proposed TD and the CaseInsensitiveDict implemented in Kenneth Reitz's HTTP requests library. The latter keeps the last key added rather than the first. It also has a cleaner implementation and the API is a bit nicer (no getitem() method). * The "originals" dict maps a transformed key back to its first saved original value. An alternative would be to map back to a set of original values or a list of original values. * A possible use case is for a unicode normalizing dictionary where 'L' + chr(111) + chr(776) + 'wis' would match 'L' + chr(246) + 'wis'. * The implementation looks rough at this point, but that is easily fixed-up. I'll leave some specific suggestions on the tracker (getting it to accept a list of tuples in the constructor, a recursive repr, renaming the getitem() method, deprivatizing the attributes, getting it to work with __missing__, etc). * Having two-mappings-in-one seems to be the most awkward part of the design and wouldn't be needed for the two strongest use cases, a case-insensitive-but-not-case-preserving dict and an identity dict. * In http://stackoverflow.com/questions/13230414, the OP wants a CI dict but doesn't need the case preserving aspect. The OP is provided with a regular dictionary containing mixed case keys and needs to compare a list of potential matches of unknown case. Copying the dict to a case-folding TD wouldn't provide any incremental advantage over building a regular dict with lower case keys. * In http://www.gossamer-threads.com/lists/python/python/209527, the OP wanted an ID comparison for a symbolic calculation. The objects already had an eq function and he wanted to temporarily bypass that in a symbol lookup. Essentially he needed a dictionary that would allow the use of an alternative equality function. A TD would work here but there would be no need for the key preserving feature. There doesn't seem to be any advantage over using a regular dict that directly stores the id() as the key. * In https://mail.python.org/pipermail/python-ideas/2010-May/007235.html, the OP wants a highly optimized identity dictionary that doesn't call an expensive id() function. The proposed TD doesn't fulfill this request -- it would be far from being optimized and would call id() on every store and every lookup. * In http://msdn.microsoft.com/en-us/library/xfhwa508.aspx, the API describes a dict with an alternative equality comparison. This is a different design than the TD and is for a world that is somewhat different from Python. In that dict, eq and hash are specified at the dict level rather than object level (in Python, the key objects define their own __eq__ and __hash__ rather than having the user attach the functions directly to the dictionary). * In http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html, the IdentityHashMap is described as being for rare use cases such as topology-preserving object graph transformations likes serialization or deep-copying. Looking at Python's own code for copy.deepcopy(), the TD would not be a good replacement for the existing code (more memory intensive, slower, no use for the originals dict, and no use for most of the TD functionality). It appears that it is simpler and faster to store and lookup d[id(obj)] than to use a TD. * If this were being modeled in a database, we would have one table with a many-to-one mapping of original keys to transformed keys and another table with a transformed key as the primary key in a table of key-value pairs. This suggests two lookup patterns original->tranformed->value and transformed->all_originals. * The Apache case insensitive dict documentation includes these thoughts: "This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled. --- Note that CaseInsensitiveMap is not synchronized and is not thread-safetiveMap.html --- This map will violate the detail of various Map and map view contracts. As a general rule, don't compare this map to other maps. In particular, you can't use decorators like ListOrderedMap on it, which silently assume that these contracts are fulfilled." * Using ACK to search through Django, it looks like there over a half-dozen case-insensitive lookups with are implemented using d[k.lower()] and wouldn't be better-off using the proposed TD. The notes above are rough and I still have more work to do: * close reading of the remaining links in the PEP * another round of user testing this week (this time with far more experienced engineers) * review the long python-dev thread in more detail * put detailed code suggestions on the tracker All of this will take some time. I understand that the 3.4 feature deadline is looming, but I would like to take the time to thoroughly think this through and make a good decision. If I had to choose right now, a safe choice would be to focus on the primary use case and implement a clean CaseInsensitiveDict without the double-dict first-saved case-preserving feature. That said, I find the TD to be fascinating and there's more work to do before making a decision. Hopefully, this post will make the thought process more transparent. Cheers, Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com