On Tue, Nov 27, 2001 at 11:31:39AM -0600, Andy Bach wrote: > Hi. > > Okay, would the maths folk like to offer a helpful explanation? > > a > > ---------- Forwarded message ---------- > > Question: > How do I select a random line from a file? > > Here's an algorithm from the Camel Book: > > srand; > rand($.) < 1 && ($line = $_) while <>; > > This has a significant advantage in space over reading the whole > file in. A simple proof by induction is available upon request if > you doubt its correctness. >
Here is a proof by induction. When N = 1, the line is selected with a probability of 1, so this algorithm works for N = 1. Assume this algorithm works for N. For N + 1, the last line is selected with a probability of 1/(N+1). Each of lines 1 through N is selected with a probability of (1/N)(1-(1/(N+1))) = (1/N)(N/(N+1)) = 1/(N+1). (Thats's the probability of the line being selected out of the N lines times the probability of line N+1 not being selected.) So, if this algorithm works for N it also works for N + 1. Therefore, this algorithm works for any N >= 1. Ronald
