On 7 November 2017 at 16:21, Steven D'Aprano <st...@pearwood.info> wrote: > On Mon, Nov 06, 2017 at 08:05:07PM -0800, David Mertz wrote: >> Maybe OrderedDict can be >> rewritten to use the dict implementation. But the evidence that all >> implementations will always be fine with this restraint feels poor, > > I think you have a different definition of "poor" to me :-)
While I think "poor" is understating the case, I think "excellent" (which you use later on) is overstating it. My own characterisation would be "at least arguably good enough". > Nick has already done a survey of PyPy (which already has insertion- > order preserving dicts), Jython, VOC, and Batavia, and they don't have > any problem with this. For these, my research only showed that their respective platforms have an order-preserving hashmap implementation available. What's entirely unclear at this point is how switching wholesale to that may impact the *performance* of these implementations (both in terms of speed and memory usage), and how much code churn would be involved in actually making the change. Making builtin dict() order-preserving may also still impose an ongoing complexity cost on these implementations if they end up having to split "the mapping we use for code execution namespaces" away from "the mapping we provide as the builtin dict". (That said, I think at least Jython already makes that distinction - I believe their string-only namespace dict is a separate type, whereas CPython plays dynamic optimisation games inside the regular dict type). So Barry's suggestion of providing an explicit "collections.UnorderedDict" as a consistent spelling for "an associative mapping without any ordering guarantees" is a reasonable one, even if it's primary usage in CPython ends up being to ensure algorithms are compatible with collections that don't provide an inherently stable iteration order, and any associated performance benefits are mostly seen on other implementations. (As Paul S notes, such a data type would also serve a pedagogical purpose when teaching computer science principles) > IronPython is built on C#, which has order- > preserving mappings. I couldn't actually find a clearly suitable existing collection type in the .NET CLR - the one I linked was just the one commonly referenced as "good enough for most purposes". It had some constraints that meant it may not be suitable as a builtin dict type in a Python implementation (e.g. it looked to have a 32-bit length limit). > Nuitka is built on C++, and if C++ can't implement > an order-preserving mapping, there is something terribly wrong with the > world. Cython (I believe) uses CPython's implementation, as does > Stackless. Right, the other C/C++ implementations that also target environments with at least 128 MiB+ RAM (and typically more) can reasonably be expected to tolerate similar space/speed trade-offs to those that CPython itself makes (and that's assuming they aren't just using CPython's data type implementations in the first place). > The only well-known implementation that may have trouble with this is > MicroPython, but it already changes the functionality of a lot of > builtins and core language features, e.g. it uses a different method > resolution order (so multiple inheritence won't work right), some > builtins don't support slicing with three arguments, etc. > > I think the evidence is excellent that other implementations shouldn't > have a problem with this, unless (like MicroPython) they are targetting > machines with tiny memory resources. µPy runs on the PyBoard, which I > believe has under 200K of memory. I think we can all forgive µPy if it > only *approximately* matches Python semantics. It runs on the ESP8266 as well, and that only has 96 kiB data memory. This means we're already talking 3-4 orders of magnitude difference in memory capacity and typical data set sizes between CPython and MicroPython use cases, and that's only accounting for the *low* end of CPython's use cases - once you expand out to multi-terabyte data sets (the very upper end of what a single x86-64 server can handle if you can afford the RAM for it), then we're talking 9-10 orders of magnitude between CPython's high end and MicroPython's low end. So for CPython's target use cases algorithmic efficiency dominates performance, and we afford to invest extra memory usage and startup overhead in service to more efficient data access algorithms. MicroPython's the opposite - you're going to run out of memory for data storage long before algorithmic inefficiency becomes your biggest problem, but wasted bookkeeping memory and startup overhead can cause problems even with small data sets. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com