On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
> I would like to write a script that will select N number of
> random lines in a file.  Any suggestions on how to do this?

This program has the advantage that it doesn't read the whole file into
memory, which is important if the file is large. Save it as an executable
file, randomlines, and then type "randomlines N file" to get N random
lines from file (without repetition). The lines will be in the same
order they were in the file. Type "randomlines -r N file" to get N random
lines in random order.

#! /usr/bin/perl -s

        $N=shift; #first arg is N
        srand;
        while(<>){
        if(rand($.) < $N){
                        if(@lines == $N){
                                # drop one random element
                                splice @lines,int rand $N,1;
                        }
                        if($r){
                                splice @lines, int rand @lines+1, 0, $_;
                        }
                        else{
                        push @lines, $_;
                        }
        }
        }

        print $_ for @lines;

__END__

The proof that the algorithm is correct is by induction on the number of lines
in the file (also, see Knuth reference below). 

It is based on a program  in the perl documentation that returns 1 random
line from a file, which I found by typing "perldoc -q 'random line'":

  How do I select a random line from a file?
  
        Here's an algorithm from the Camel Book:
  
            srand;
            rand($.) < 1 && ($line = $_) while <>;
  
        This has a significant advantage in space over reading the whole file
        in.  You can find a proof of this method in The Art of Computer Pro-
        gramming, Volume 2, Section 3.4.2, by Donald E. Knuth.
  
        You can use the File::Random module which provides a function for that
        algorithm:
  
                use File::Random qw/random_line/;
                my $line = random_line($filename);
  
        Another way is to use the Tie::File module, which treats the entire
        file as an array.  Simply access a random array element.
 (END)


Winston Smith, [EMAIL PROTECTED] where x=winstonsmith, y=ispwest.com


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to