On Sat, 01 Nov 2008 18:58:59 +0000, Tim Rowe wrote: > 2008/10/27 <[EMAIL PROTECTED]>: >> Lie Ryan: >> >>>Oh no, the two dict implementation would work _exactly_ the same from >>>the outside, they are transparently interchangeable. Only the >>>performance characteristic differs because of the different >>>implementation.< >> >> I don't agree with the general idea. If the operations done by your >> data structure have different computational complexity, then they are >> fit for different usages. When you program you must know what >> computational complexity has each of the operations of your data >> structyre, otherwise there's no way to know the complexity of your >> whole program, so instead of programming you are just become a mage >> that tries magical spells and hopes for the better. > > No, when you program you know how you will be using the data structure, > so you can choose the implementation that's right for the application.
I don't understand why you have prefixed that sentence with "No". It seems like you are agreeing with bearophile's point. > That's what the container libraries for other languages do. At the > moment, you just have one implementation, and have to hope it's right > for the job. To the extent that this is true, it is a sign of the poverty of the standard Python library. But in fact it isn't true. Python has quite a decent collection of collection types. Just off the top of my head: tuple, namedtuple, list, set, frozenset, dict, defaultdict, queue, deque set, list and dict are highly optimized for general use, and are very suitable for building new data structures. And you are wrong to say that we have to "hope it's right for the job", we can build perfectly acceptable pure-Python data structures from these building blocks, or add our own C extensions. You're never forced to use a standard library data structure, it's a trade-off of your time and effort against runtime efficiency. And because the standard library types are generally so excellent, that trade-off is often (but not always) a no-brainer. > Adding an *optional* parameter that says, in effect, "I > want this list optimised for writes and reads at both ends" or "I want > this list optimised for fully random reads but writes only at the end" > doesn't *lose* you any information about what you're programming with. It might not lose information, but it does obfuscate it. Right now, we can do this: if isinstance(obj, list): print "optimized for random reads and writes at the end" elif isinstance(obj, deque): print "optimized for writes at both the start and end" If you care about such things, it's easy to find out, because there is a one-to-one mapping from type to behaviour. Under Lie's suggestion, we would have a one-to-many mapping from type to behaviour, forcing us to do this: if isinstance(obj, list): if obj.implementation == 'deque' print "optimized for writes at both the start and end" else: print "optimized for random reads and writes at the end" That's assuming the implementation is available at runtime at all; if it's not, then you have lost information. > Of course it's essential that the data structure has identican > /functional/ behaviour whatever optimisation you use. "Essential"? That's an interesting point. Let's look at an actual example. Consider one of Lie's initial examples (paraphrased): D1 = dict({1: 'a', 2: 'b'}) # standard hash-table D2 = dict({1: 'a', 2: 'b'}, implementation="binary tree") You say it's "essential" that the binary tree implementation has identical functional behaviour to standard hash-table implementation of dict. Okay, let's see where that leads us. The functional behaviour of hash-tables is that they are O(1) for reads, while binary trees are O(log(N)). The only way for them to have identical functional behaviour is to *artificially* slow down dicts to the lowest common speed. That is a Bad Thing, and I don't understand why you think it is essential. But perhaps you didn't actually mean identical behaviour, and merely an identical *interface*. That's more reasonable, we no longer have to artificially change the behaviour of the type -- be we do have to artificially cripple the API of the type! The advantage of binary trees as a mapping is that they are ordered. Binary trees have a rich set of traversal methods: you can walk the tree in post-order, pre-order, in-order or depth first. But hash-tables don't have *any* of those, so now we have to forbid all binary-tree traversal methods. In which case, why bother paying the cost of binary trees if you don't get the extra functionality? But maybe you'll argue that Lie's examples are bad examples. Perhaps what you have in mind is something like the difference between "dicts implemented as hash tables with chaining" and "dicts implemented as hash tables with open addressing". (Aside: what sort of open addressing? Linear? Quadratic? Something else?) At least now we're talking about different implementations with the same API and more-or-less the same functional behaviour. I'd suggest that 99% of the time, client code simply shouldn't care about the differences. The existing implementations are Good Enough. Have you *ever* found yourself saying "I can't use the built-in dict, because it uses chaining and for my application I need linear open addressing"? (Or vice versa.) I doubt you've every said that. I doubt you know anyone who has said it. I doubt you know *of* anyone who has said it. And even if you have, then you are free to go and write your own dict type as a C extension type and call it dict_with_linear_openaddressing and write this: D = dict_with_linear_openaddressing(data) instead of: D = dict(data, implementation = "linear open addressing") Yet again Lie's proposal gains you nothing. In fact, as an application writer, you're better off writing your own type rather than waiting for it to be added to the built-in implementation of dict. The one exception to this is when you're developing a *new type* that doesn't already exist in the standard library, and you haven't decided which implementation(s) you should use, so you write multiple implementations for testing. Here's the wrong way to do it: class Foo(object): def __init__(self, data, implementation='standard'): self.implementation = implementation if implementation == 'standard': # save data one way self.data = foo(data) elif implementation == 'magic': # save data another way self.data = bar(data) elif implementation == 'green': # save data a third way self.data = baz(data) else: raise ValueError('unknown implementation') def len(self): if implementation == 'standard': return len(self.data) elif implementation == 'magic': return len(self.data[0]) + len(self.data[1:]) elif implementation == 'green': return self._len Why would you write it that way? Worst Design Ever -- it's confusing and error prone and fails to obey the Don't Repeat Yourself principle, thus increasing the risk of errors. And testing and documentation are significantly more complicated. Here's a better way: class Foo(object): class _StandardFoo(data): def __init__(self, data): self.data = foo(data) def __len__(self): return len(self.data) class _MagicFoo(data): def __init__(self, data): self.data = bar(data) def __len__(self): return len(self.data[0]) + len(self.data[1:]) class _GreenFoo(data): def __init__(self, data): self.data = baz(data) def __len__(self): return self._len _mapping = {'standard': _StandardFoo, 'magic': _MagicFoo, 'green': _GreenFoo} def __new__(cls, data, implementation='standard'): try: return cls._mapping[implementation] except KeyError: raise ValueError('unknown implementation') Here's an even better way: class StandardFoo(data): def __init__(self, data): self.data = foo(data) def __len__(self): return len(self.data) class MagicFoo(data): def __init__(self, data): self.data = bar(data) def __len__(self): return len(self.data[0]) + len(self.data[1:]) class GreenFoo(data): def __init__(self, data): self.data = baz(data) def __len__(self): return self._len and let the client code call whichever implementation they want. [snip] > They're not different data structures from the client point of view. But of course they are. They have different functional behaviour, and different APIs. Unless we artificially cripple the implementations and make them all identical (which defeats the purpose of having different implementations!) they are different data structures, and those differences will be visible to the client code. This scheme is a case of over-generalisation. It's also impractical: who is going to spend the time and effort to do it? It's hard enough to get somebody to write a binary tree implementation for the standard library, without expecting them to *also* write a wrapper for it to make it look like a dict. It's working against the grain of Python, which is well suited for the one-type = one-API model instead of the one-type = many- APIs model which Lie is suggesting. And for what benefit? The only benefit suggested so far is that we can write: dict({1: 'a', 2: 'b'}, implementation="binary tree") instead of binarytree({1: 'a', 2: 'b'}) If you call that a benefit. I don't. -- Steven -- http://mail.python.org/mailman/listinfo/python-list