Re: Comparing strings from the back?

Oscar Benjamin Thu, 06 Sep 2012 04:07:56 -0700

On Thu, 06 Sep 2012 06:07:38 -0400, Dave Angel <d...@davea.name> wrote:

For random strings (as defined below), the average compare time is
effectively unrelated to the size of the string, once the size

passes

some point.

Define random string as being a selection from a set of characters,

with

replacement. So if we pick some set of characters, say 10 (or 256,

it

doesn't really matter), the number of possible strings is 10**N.

The likelihood of not finding a mismatch within k characters is(1/10)**k The likelihood of actually reaching the end of the

random

string is (1/10)**N. (which is the reciprocal of the number of

strings,

naturally)

If we wanted an average number of comparisons, we'd have to

calculate a

series, where each term is a probability times a value for k.
   sum((k * 9*10**-k) for k in range(1, 10))

Those terms very rapidly approach 0, so it's safe to stop after a

few.

Looking at the first 9 items, I see a value of 1.1111111

This may not be quite right, but the value is certainly well under

2 for

a population of 10 characters, chosen randomly.  And notice that N
doesn't really come into it.

It's exactly right. You can obtain this result analytically fromJohannes' formula above. Just replace 256 with 10 to get that theexpected number of comparisons is


(10/9)*(1 - 10**(-N))

The last term shows the dependence on N and is tiny even for N=9.

Oscar

--
http://mail.python.org/mailman/listinfo/python-list

Re: Comparing strings from the back?

Reply via email to