On Wed, Sep 5, 2012 at 2:32 AM, Johannes Bauer <dfnsonfsdu...@gmx.de> wrote:
> How do you arrive at that conclusion? When comparing two random strings,
> I just derived
>
> n = (256 / 255) * (1 - 256 ^ (-c))
>
> where n is the average number of character comparisons and c. The
> rationale as follows: The first character has to be compared in any
> case. The second with a probability of 1/256, the third with 1/(256^2)
> and so on.

That would be for comparing two random areas of memory. Python strings
don't have 256 options per character; and in terms of actual strings,
there's so many possibilities. The strings that a program is going to
compare for equality are going to use a vastly restricted alphabet;
for a lot of cases, there might be only a few dozen plausible
characters.

But even so, it's going to scale approximately linearly with the
string length. If they're really random, then yes, there's little
chance that either a 1MB string or a 2MB string will be the same, but
with real data, they might very well have a long common prefix. So
it's still going to be more or less O(n).

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to