rpjday wrote:
> 
> actually, if you examine the algorithm, the probability that line n
> will be chosen as the "new" random line as you read through the file
> is precisely 1/n, so all lines are equally likely.

I don't think that is what is happening here.  I don't think that all
lines are equally likely. To me and maybe I am being obtuse here (it has
happened before) only the probability that the last one will be
overwritten is 1/n and n changes for each line read.  In fact, it may
approach equality ( don't really know the math).  Now as I look at it it
would seen that it would be weighted the other direction than I
originally thought.  It would be very hard to get the first line since
each line after it has a 1/n chance of overwriting it and it can never
be gotton back.  That means that even after the second iteration, there
is a .5 probabliity that it would still be there and after the 3rd a 0.5
* 0.333 = 0.1665 probability that it would still be there ...  I will
grant that the last one has a 1/n chance though.

It has bee a really long time since my one stats class but something
just does not look quite right to me 

01    #!/usr/bin/perl
02
03    srand;
04    while (<>)
05    {
06      if (rand($.) < 1)
07        {
08          $line = $_;
09        }
10    }
11    print $line;


Thanks for your patience.

Bret



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to