"John Salerno" <[EMAIL PROTECTED]> writes: > "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> the above code is pretty much of a no-no because it has quadratic runtime >> behavior.
> What's that mean, exactly? It means that running time will increase with the square of the problem size. 2.000.000 items will take 4 times as long as 1.000.000 items, and 5.000.000 items will take 25 times as long as 1.000.000 items. > Are you referring to both examples, or just the > second one? The problem is that the lookup you did to see if an item had already been seen, is done by a linear search of a list. The search time is linearly proportional to the number of items on the list 'new', but since the number of times you do that search increases with the length of the input, the total runtime for your function increases even more. This thus applies to both your examples, since they really do the same thing. However, it actually depends on what your input is. For the runtime to increase with the square of the input length in your function, the number of *unique* items on the input must be a constant fraction of the input length. By that I mean that, for example, 5% of the input items are unique, so that if your input is 100 items, then the list 'new' will be 5 items at the end of the function, and if your input is 5.000.000 items, then 'new' will become 250.000 items long. This might not be the case in your use. If you for example know that the input only consists of integers between 0 and 10, then 'new' will never become longer than 11 items, regardless of whether the input is 20 or 20.000.000.000 items. The runtime of your function then suddenly becomes linear in the problem size instead of quadratic. Even so, maintaining the set of already seen items as a set is likely to be faster than keeping them as a list, even for a fairly small number of unique items. One small exception from this, though. If only maintaining one data structure (the ordered list 'new' in your code) makes your working set fit within the processor cache, while maintaining two data structures (a list for keeping the order, and a set for searching) will make it *not* fit within the cache, then it *might* actually be faster to just maintain the list. Unlikely, but not entirely impossible, and just a small change of the problem size can change the balance again. -- Thomas Bellman, Lysator Computer Club, Linköping University, Sweden "What sane person could live in this world ! bellman @ lysator.liu.se and not be crazy?" -- Ursula K LeGuin ! Make Love -- Nicht Wahr!
-- http://mail.python.org/mailman/listinfo/python-list