On Tue, Nov 27, 2001 at 11:31:39AM -0600, Andy Bach wrote:
> Hi.
> 
> Okay, would the maths folk like to offer a helpful explanation? 
> 
> a
> 
> ---------- Forwarded message ----------
> 
> Question:
> How do I select a random line from a file?
> 
>     Here's an algorithm from the Camel Book:
> 
>         srand;
>         rand($.) < 1 && ($line = $_) while <>;
> 
>     This has a significant advantage in space over reading the whole
>     file in. A simple proof by induction is available upon request if
>     you doubt its correctness.
> 

Here is a proof by induction.

When N = 1, the line is selected with a probability of 1, so this
algorithm works for N = 1.

Assume this algorithm works for N.  For N + 1, the last line is selected
with a probability of 1/(N+1).  Each of lines 1 through N is selected with
a probability of (1/N)(1-(1/(N+1))) = (1/N)(N/(N+1)) = 1/(N+1).  (Thats's
the probability of the line being selected out of the N lines times the
probability of line N+1 not being selected.)  So, if this algorithm works
for N it also works for N + 1.

Therefore, this algorithm works for any N >= 1.

Ronald

Reply via email to