On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: > I would like to write a script that will select N number of > random lines in a file. Any suggestions on how to do this?
This program has the advantage that it doesn't read the whole file into memory, which is important if the file is large. Save it as an executable file, randomlines, and then type "randomlines N file" to get N random lines from file (without repetition). The lines will be in the same order they were in the file. Type "randomlines -r N file" to get N random lines in random order. #! /usr/bin/perl -s $N=shift; #first arg is N srand; while(<>){ if(rand($.) < $N){ if(@lines == $N){ # drop one random element splice @lines,int rand $N,1; } if($r){ splice @lines, int rand @lines+1, 0, $_; } else{ push @lines, $_; } } } print $_ for @lines; __END__ The proof that the algorithm is correct is by induction on the number of lines in the file (also, see Knuth reference below). It is based on a program in the perl documentation that returns 1 random line from a file, which I found by typing "perldoc -q 'random line'": How do I select a random line from a file? Here's an algorithm from the Camel Book: srand; rand($.) < 1 && ($line = $_) while <>; This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Pro- gramming, Volume 2, Section 3.4.2, by Donald E. Knuth. You can use the File::Random module which provides a function for that algorithm: use File::Random qw/random_line/; my $line = random_line($filename); Another way is to use the Tie::File module, which treats the entire file as an array. Simply access a random array element. (END) Winston Smith, [EMAIL PROTECTED] where x=winstonsmith, y=ispwest.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]