On Wed, Sep 5, 2012 at 2:32 AM, Johannes Bauer <dfnsonfsdu...@gmx.de> wrote: > How do you arrive at that conclusion? When comparing two random strings, > I just derived > > n = (256 / 255) * (1 - 256 ^ (-c)) > > where n is the average number of character comparisons and c. The > rationale as follows: The first character has to be compared in any > case. The second with a probability of 1/256, the third with 1/(256^2) > and so on.
That would be for comparing two random areas of memory. Python strings don't have 256 options per character; and in terms of actual strings, there's so many possibilities. The strings that a program is going to compare for equality are going to use a vastly restricted alphabet; for a lot of cases, there might be only a few dozen plausible characters. But even so, it's going to scale approximately linearly with the string length. If they're really random, then yes, there's little chance that either a 1MB string or a 2MB string will be the same, but with real data, they might very well have a long common prefix. So it's still going to be more or less O(n). ChrisA -- http://mail.python.org/mailman/listinfo/python-list