On 09/06/2012 04:33 AM, Steven D'Aprano wrote: > <snip> > > I may have been overly-conservative earlier when I said that on > average string equality has to compare half the characters. I thought > I had remembered that from a computer science textbook, but I can't > find that reference now, so possibly I was thinking of something else. > (String searching perhaps?). In any case, the *worst* case for string > equality testing is certainly O(N) (every character must be looked > at), and the *best* case is O(1) obviously (the first character fails > to match). But I'm not so sure about the average case. Further thought > is required.
For random strings (as defined below), the average compare time is effectively unrelated to the size of the string, once the size passes some point. Define random string as being a selection from a set of characters, with replacement. So if we pick some set of characters, say 10 (or 256, it doesn't really matter), the number of possible strings is 10**N. The likelihood of not finding a mismatch within k characters is (1/10)**k The likelihood of actually reaching the end of the random string is (1/10)**N. (which is the reciprocal of the number of strings, naturally) If we wanted an average number of comparisons, we'd have to calculate a series, where each term is a probability times a value for k. sum((k * 9*10**-k) for k in range(1, 10)) Those terms very rapidly approach 0, so it's safe to stop after a few. Looking at the first 9 items, I see a value of 1.1111111 This may not be quite right, but the value is certainly well under 2 for a population of 10 characters, chosen randomly. And notice that N doesn't really come into it. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list