John Salerno <[EMAIL PROTECTED]> wrote: >Roy Smith wrote: > >> One may be marginally faster, but they both require copying the source >> string, and are thus both O(n). > >Sorry, I'm not familiar with the O(n) notation.
OK, here's a quick tutorial to "big-oh" notation. It's a way of measuring algorithmic complexity. The O stands for "on the Order of". For any algorithm, if you process n things (in the case of the strings we're talking about, n would be the total number of characters in all the strings), you can compute the number of steps it takes to complete all the processing as some function of n. For example, let's say a given algorithm took 4n^2 + 5n + 17 steps to complete. It doesn't take much experimentation to prove to yourself that for all but the very smallest values of n, the constant 17 is completely insignificant. In fact, n doesn't have to get very big before the 5n term becomes insignificant too. To a reasonable approximation, you could say that the algorithm takes 4n^2 steps and be close enough. For small values of n, this will be a bad approximation, but it doesn't really matter for small values of n. For large values of n (think hundreds, thousands or even millions), it's just fine. So, the first rule of O() notation is that when looking at how fast an algorithm runs, you need only consider the highest order term (i.e. the one with the biggest exponent). But, it gets even better. Let's compare two algorithms, the one above, which takes (approximately) 4n^2 steps, and another one which takes 10n steps, for varous values of n: n 10n 4n^2 --- ------ ------ 1 10 4 2 20 16 3 30 36 4 40 64 5 50 100 6 60 144 7 70 196 8 80 256 9 90 324 10 100 400 11 110 484 12 120 576 13 130 676 14 140 784 15 150 900 16 160 1024 17 170 1156 18 180 1296 19 190 1444 20 200 1600 Notice that it doesn't take long for the fact that n^2 grows a lot faster than n to swamp the fact that 10 is bigger than 4. So the next step in simplification is to say that not only don't the lower-order terms matter, but the coefficient on the highest order term doesn't even matter. For a large enough n, all that matters is the exponent (for the moment, I'm making a further simplification that all algorithms can be described by polynomials with integer exponents). So, we get things like: O(n^0), which is almost always written as O(1). This is a "constant time" algorithm, one which takes the same amount of steps to execute no matter how big the input is. For example, in python, you can write, "x = 'foo'". That assignment statement takes the same amount of time no matter how long the string is. All of these execute in the same number of steps: x = '' x = 'foo' x = 'a very long string with lots and lots of characters' We can say that "assignment is constant time", or "assignment is O(1)". The next step up is O(n). This is "linear time" algorithm. It takes a number of steps which is directly proportional to the size of the input. A good example might be "if 'x' in 'bar'". The obvious way to implement this (and, I assume, the way it is implemented in Python), is to loop over the string, comparing 'x' to each character in 'bar', one by one. This takes as many steps as there are characters in the string. Next comes O(n^2), also knows as "quadratic time". This means if your input is of size n, the algorithm will take n^2 steps to run. Quadratic algorithms are generally considered very slow, and to be avoided if at all possible. Once you get used to thinking like this, it's easy to look at a piece of code and say, "oh, that's quadratic", or "that's linear", and instantly know which is faster for some some large input. And once you've started thinking like that, you've made one of the big jumps from thinking like a coder to thinking like a computer scientist (or, if you prefer, like a software engineer). Google "algorithmic complexity" or "big o notation" (perhaps spelled "big oh") for more info. I would normally recommend Wikipedia, but I just took a quick look at their "Big O notation" article, and noticed it's full of formal mathematical gobbledygook which just serves to obscure what is really a fairly easy concept. -- http://mail.python.org/mailman/listinfo/python-list