On Thu, 06 Sep 2012 06:07:38 -0400, Dave Angel <d...@davea.name> wrote:
For random strings (as defined below), the average compare time is
effectively unrelated to the size of the string, once the size
passes
some point.


Define random string as being a selection from a set of characters,
with
replacement. So if we pick some set of characters, say 10 (or 256,
it
doesn't really matter), the number of possible strings is 10**N.


The likelihood of not finding a mismatch within k characters is (1/10)**k The likelihood of actually reaching the end of the
random
string is (1/10)**N. (which is the reciprocal of the number of
strings,
naturally)


If we wanted an average number of comparisons, we'd have to
calculate a
series, where each term is a probability times a value for k.
   sum((k * 9*10**-k) for k in range(1, 10))




Those terms very rapidly approach 0, so it's safe to stop after a
few.
Looking at the first 9 items, I see a value of 1.1111111


This may not be quite right, but the value is certainly well under
2 for
a population of 10 characters, chosen randomly.  And notice that N
doesn't really come into it.

It's exactly right. You can obtain this result analytically from Johannes' formula above. Just replace 256 with 10 to get that the expected number of comparisons is

(10/9)*(1 - 10**(-N))

The last term shows the dependence on N and is tiny even for N=9.

Oscar

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to