On Tue, Nov 27, 2001 at 11:31:39AM -0600, Andy Bach wrote:
> Okay, would the maths folk like to offer a helpful explanation? 

IANAMathFolk, but I can quote (with fair use) from the Cookbook:

   Obviously, a file with one line (N=1) is fair: you always keep the first
   line because 1/1 = 100%, making it fair for files of 1 line. For a file
   with two lines, N=2. You always keep the first line; then when reaching
   the second line, you have a 50% chance of keeping it. Thus, both lines
   have an equal chance of being selected, which shows that N=2 is fair. For
   a file with three lines, N=3. You have a one-third chance, 33%, of keeping
   that third line. That leaves a two-thirds chance of retaining one of the
   first two out of the three lines. But we've already shown that for those
   first two lines there's a 50-50 chance of selecting either one. 50 percent
   of two-thirds is one-third. Thus, you have a one-third chance of selecting
   each of the three lines of the file.
 
   In the general case, a file of N+1 lines will choose the last line 1/(N+1)
   times and one of the previous N lines N/(N+1) times. Dividing N/(N+1) by N
   leaves us with 1/(N+1) for each the first N lines in our N+1 line file,
   and also 1/(N+1) for line number N+1. The algorithm is therefore fair for
   all N, where N is a positive integer.

-- 
"It's God.  No, not Richard Stallman, or Linus Torvalds, but God."
(By Matt Welsh)

Reply via email to