Jess Balint wrote:
> 
> Hello all, I have a file of 3,210,008 CSV records. I need to take a random
> sample of this. I tried hacking something together a while ago, but it
> seemed to repeat 65,536 different records. When I need a 5mil sample, this
> creates a problem.
> 
> Here is my old code: I know the logic allows dups, but what would incur the
> limit? I think with 500,000 samples there wouldn't be a problem getting more
> than 65536 diff records, but that number is too ironic for me to deal with.
> Thanks.
> 
> #!/usr/local/bin/perl -w
> 
> open (FILE,"consumer.sample.sasdump.txt");
> open (NEW,">consumer.new");
> 
> @data = <FILE>;
> 
> for ( $jess == 1; $jess < 500000; $jess++ ) {
>         $index = rand @data;
>         print NEW $data[$index];
> }
> 
> close(FILE);
> close(NEW);


This should do what you want:

#!/usr/local/bin/perl -w
use strict;

srand;

open FILE, 'consumer.sample.sasdump.txt' or die "Cannot open
'consumer.sample.sasdump.txt': $!";
open NEW,  '> consumer.new' or die "Cannot open 'consumer.new': $!";

my @data = <FILE>;

for ( 1 .. 500_000 ) {
    print NEW splice @data, rand @data, 1;
}

close FILE;
close NEW;



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to