RE: Re: Perl code for comparing two files

2009-05-09 Thread Wagner, David --- Senior Programmer Analyst --- CFS
 -Original Message-
 From: news [mailto:n...@ger.gmane.org] On Behalf Of Richard Loveland
 Sent: Friday, May 08, 2009 11:59
 To: beginners@perl.org
 Subject: Re: Perl code for comparing two files
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Mr. Adhikary,
 
 The following will take any number of files as arguments, in 
 the format
 you described (I even tested it! :-)). It goes through each line of
 those files, stuffing (the relevant part of) each line in a 
 'seen' hash
 (more on that, and other, hash techniques here if you're interested:
 http://www.perl.com/pub/a/2006/11/02/all-about-hashes.html).
 
 The code below does not keep track of line numbers as you 
 requested, but
 I think the hash technique used here could help you as you approach a
 solution to your particular problem.
 
 
 #!/usr/bin/perl
 
 use strict;
 use warnings;
 use File::Slurp; # This is where 'read_file' lives
 
 my %seen;
 
 for my $arg ( @ARGV ) {
 my @lines = read_file( $arg );
 for my $line ( @lines ) {
 chomp $line;
 my @elems = split / /, $line;
 my $value = $elems[1];
 $seen{$value}++;
 }
 }
 
 for my $k ( keys %seen ) {
 print $k, \n if $seen{$k}  1;
 }
 
This is similar to above, but no File::Slurp and uses an hash
combined with an array with [0] being the count of seen items, [ zero]
is line number and index is the file it was from. I have given you a
Data::Dumper. I ran with the fieles you provided.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my %seen;
my $MyLineNbr = 1;
my %MFN = ();
my $MyFilenames = \%MFN;
my $MyFileCnt = 1;
my $MyCurrFile = q[];

while (  ) {
if ( $ARGV ne  $MyCurrFile ) {
printf Filename: %s (%d)\n, $ARGV, $MyFileCnt;
$MyCurrFile = $ARGV;
$MyFilenames-{$MyCurrFile} = $MyFileCnt++;
$MyLineNbr = 0;
 }
chomp;
$MyLineNbr++;
next if ( /^\s*$/ );
my @elems = split (/ /, $_);
my $value = $elems[1];
$seen{$value}[0]++;
$seen{$value}[$MyFilenames-{$MyCurrFile}] = $MyLineNbr;
}
print Dumper(\%seen);

 
 Regards,
 Rich Loveland
 
 
 Anirban Adhikary wrote:
  Hi List
  I am writing a perl code which will takes 2 more files as 
 argument. Then It
  will check the the values of each line of a file with 
 respect with another
  file. If some value matches then it will write the value 
 along with line
  number to another ( say outputfile) file.
  
  The source files are as follow
  
  Contents of abc.txt
  1 2325278241,P0
  2 2296250723,MH
  3 2296250724,MH
  4 2325277178,P0
  5 7067023316,WL
  6 7067023329,WL
  7 2296250759,MH
  8 7067023453,WL
  9 7067023455,WL
  10 555413,EA05
  ###
  Contents of xyz.txt
  1 7067023453,WL
  2 31-DEC-27,2O,7038590671
  3 31-DEC-27,2O,7038596464
  4 31-DEC-27,2O,7038596482
  5 2296250724,MH
  6 31-DEC-27,2O,7038597632
  7 31-DEC-27,2O,7038589511
  8 31-DEC-11,2O,7038590671
  9 7067023455,WL
  10 31-DEC-27,2O,7038555744
  ###
  Contents of pqr.txt
  1 2325278241,P0
  2 7067023316,WL
  3 7067023455,WL
  4 2296250724,MH
  
  
  
  
  
  
  For this requirement I have written the following code 
 which works fine for
  2 input files
  
  use strict;
  use warnings;
  
  use Benchmark;
  
  if(@ARGV  2) {
  print Please enter atleast two or more  .orig file names \n;
  exit 0;
  }
  my @file_names = @ARGV;
  chomp(@file_names);
  my @files_to_process;
  
  for(@file_names) {
  if( -s $_){
  print File $_ exists\n;
  push(@files_to_process,$_);
  }
  elsif( -e $_) {
  print File $_ exists but it has zero byte size\n;
  }
  else {
  print File $_ does not exists \n;
  }
  }
  
  my $count = @files_to_process;
  if( $count  2 ) {
  print Atleast 2 .orig files are required to continue this
  program\n;
  exit 0;
  }
  
  my $output_file = outputfile;
  my $value = 0;
  my $start_time = new Benchmark;
  
  
  if( $count = 2 ) {
  while ($count) {
  my 
 ($files_after_processing_pointer,$return_val) =
  create_itermediate_file (\...@files_to_process,$value);
  my @files_after_processing =
  @$files_after_processing_pointer;
  $count = @files_after_processing;
  $value = $return_val;
  @files_to_process = @files_after_processing;
  
  }
  
  my $end_time = new Benchmark;
  my $difference = timediff($end_time, $start_time);
  print It took , timestr($difference),  to execute 
 the program\n;
  
  }
  
  
  
  
  sub create_itermediate_file {
  my $file_pointer = $_[0];
  my $counter = $_[1];
  my @file_content = @$file_pointer

RE: Perl code for comparing two files

2009-05-09 Thread Wagner, David --- Senior Programmer Analyst --- CFS
 -Original Message-
 From: Anirban Adhikary [mailto:anirban.adhik...@gmail.com] 
 Sent: Monday, May 04, 2009 06:40
 To: beginners@perl.org
 Subject: Perl code for comparing two files
 
 Hi List
 I am writing a perl code which will takes 2 more files as 
 argument. Then It
 will check the the values of each line of a file with respect 
 with another
 file. If some value matches then it will write the value 
 along with line
 number to another ( say outputfile) file.
 
 The source files are as follow
 
 Contents of abc.txt
 1 2325278241,P0
 2 2296250723,MH
 3 2296250724,MH
 4 2325277178,P0
 5 7067023316,WL
 6 7067023329,WL
 7 2296250759,MH
 8 7067023453,WL
 9 7067023455,WL
 10 555413,EA05
 ###
 Contents of xyz.txt
 1 7067023453,WL
 2 31-DEC-27,2O,7038590671
 3 31-DEC-27,2O,7038596464
 4 31-DEC-27,2O,7038596482
 5 2296250724,MH
 6 31-DEC-27,2O,7038597632
 7 31-DEC-27,2O,7038589511
 8 31-DEC-11,2O,7038590671
 9 7067023455,WL
 10 31-DEC-27,2O,7038555744
 ###
 Contents of pqr.txt
 1 2325278241,P0
 2 7067023316,WL
 3 7067023455,WL
 4 2296250724,MH
 
 
 
 
Here is a way where a 'seen' hash has two array elements: [0] -
count, [1]: file number and line number for each seen item.
Code starts on next line:
use strict;
use warnings;

use Data::Dumper;

my %seen;
my $MyLineNbr = 1;
my %MFN = ();
my $MyFilenames = \%MFN;
my $MyFileCnt = 1;
my $MyCurrFile = q[];
my $MyActIdx = 1;

while (  ) {
if ( $ARGV ne  $MyCurrFile ) {
printf Filename: %s\n, $ARGV;
$MyCurrFile = $ARGV;
$MyFilenames-{$MyCurrFile} = $MyFileCnt++;
$MyLineNbr = 0;
 }
chomp;
$MyLineNbr++;
next if ( /^\s*$/ );
my @elems = split (/ /, $_);
my $value = $elems[1];
$seen{$value}[0]++;
$seen{$value}[$MyActIdx] .= $MyFilenames-{$MyCurrFile} . q[;] .
$MyLineNbr. q[^];
}
print Dumper(\%seen);
^--- code ends here

I leave to you to get the output, but this should give you what
need to work with.

 If you have any questions and/or problems, please let me know.
 Thanks.
 
Wags ;)
David R. Wagner
Senior Programmer Analyst
FedEx Freight
1.719.484.2097 TEL
1.719.484.2419 FAX
1.408.623.5963 Cell
http://fedex.com/us 


 
 
 For this requirement I have written the following code which 
 works fine for
 2 input files
 
 use strict;
 use warnings;
 
 use Benchmark;
 
 if(@ARGV  2) {
 print Please enter atleast two or more  .orig file names \n;
 exit 0;
 }
 my @file_names = @ARGV;
 chomp(@file_names);
 my @files_to_process;
 
 for(@file_names) {
 if( -s $_){
 print File $_ exists\n;
 push(@files_to_process,$_);
 }
 elsif( -e $_) {
 print File $_ exists but it has zero byte size\n;
 }
 else {
 print File $_ does not exists \n;
 }
 }
 
 my $count = @files_to_process;
 if( $count  2 ) {
 print Atleast 2 .orig files are required to continue this
 program\n;
 exit 0;
 }
 
 my $output_file = outputfile;
 my $value = 0;
 my $start_time = new Benchmark;
 
 
 if( $count = 2 ) {
 while ($count) {
 my 
 ($files_after_processing_pointer,$return_val) =
 create_itermediate_file (\...@files_to_process,$value);
 my @files_after_processing =
 @$files_after_processing_pointer;
 $count = @files_after_processing;
 $value = $return_val;
 @files_to_process = @files_after_processing;
 
 }
 
 my $end_time = new Benchmark;
 my $difference = timediff($end_time, $start_time);
 print It took , timestr($difference),  to execute the 
 program\n;
 
 }
 
 
 
 
 sub create_itermediate_file {
 my $file_pointer = $_[0];
 my $counter = $_[1];
 my @file_content = @$file_pointer;
 
 if($counter == 0) {
 my($first_file,$second_file) = splice
 (@file_content, 0, 2);
 open my $orig_first, , $first_file
 or die could not open 
 $first_file: $!;
 open my $orig_second, , $second_file
 or die could not open 
 $second_file:
 $!;
 open my $output_fh, , $output_file
 or die could not open 
 $output_file:
 $!;
 
 my %content_first;
 while  (my $line = 
 $orig_first) {
 chomp $line;
 if ($line) {
 
 my($line_num,$value) = split( ,$line);
 
 $content_first{$value} = $line_num

Re: Perl code for comparing two files

2009-05-09 Thread John W. Krahn

Wagner, David --- Senior Programmer Analyst --- CFS wrote:

-Original Message-
From: Anirban Adhikary [mailto:anirban.adhik...@gmail.com] 
Sent: Monday, May 04, 2009 06:40

To: beginners@perl.org
Subject: Perl code for comparing two files

Hi List
I am writing a perl code which will takes 2 more files as 
argument. Then It
will check the the values of each line of a file with respect 
with another
file. If some value matches then it will write the value 
along with line

number to another ( say outputfile) file.

The source files are as follow

Contents of abc.txt
1 2325278241,P0
2 2296250723,MH
3 2296250724,MH
4 2325277178,P0
5 7067023316,WL
6 7067023329,WL
7 2296250759,MH
8 7067023453,WL
9 7067023455,WL
10 555413,EA05
###
Contents of xyz.txt
1 7067023453,WL
2 31-DEC-27,2O,7038590671
3 31-DEC-27,2O,7038596464
4 31-DEC-27,2O,7038596482
5 2296250724,MH
6 31-DEC-27,2O,7038597632
7 31-DEC-27,2O,7038589511
8 31-DEC-11,2O,7038590671
9 7067023455,WL
10 31-DEC-27,2O,7038555744
###
Contents of pqr.txt
1 2325278241,P0
2 7067023316,WL
3 7067023455,WL
4 2296250724,MH


Here is a way where a 'seen' hash has two array elements: [0] -
count, [1]: file number and line number for each seen item.
Code starts on next line:
use strict;
use warnings;

use Data::Dumper;

my %seen;
my $MyLineNbr = 1;


Why?  Perl already has a built-in line number variable.



my %MFN = ();
my $MyFilenames = \%MFN;


Why declare two variables when you are only using one?  Why use a hash 
reference instead of just using a hash (like you do with %seen?)




my $MyFileCnt = 1;
my $MyCurrFile = q[];
my $MyActIdx = 1;


This value never changes so why use a variable?  Why no variable for 
index 0 of the array?




while (  ) {
if ( $ARGV ne  $MyCurrFile ) {
printf Filename: %s\n, $ARGV;


Why not just:

print Filename: $ARGV\n;



$MyCurrFile = $ARGV;
$MyFilenames-{$MyCurrFile} = $MyFileCnt++;


Why not just:

$MyFilenames-{$MyCurrFile}++;



$MyLineNbr = 0;
 }
chomp;
$MyLineNbr++;
next if ( /^\s*$/ );
my @elems = split (/ /, $_);
my $value = $elems[1];


Why not just:

my $value = ( split )[ 1 ];



$seen{$value}[0]++;


Why does this array element use a literal number and the next line use a 
variable?




$seen{$value}[$MyActIdx] .= $MyFilenames-{$MyCurrFile} . q[;] .
$MyLineNbr. q[^];


Since you are using a HoA anyway just push() 
$MyFilenames-{$MyCurrFile};$MyLineNbr onto the array and the count 
will be the array in scalar context.




}
print Dumper(\%seen);
^--- code ends here

I leave to you to get the output, but this should give you what
need to work with.

If you have any questions and/or problems, please let me know.
Thanks.




200 lines trimmed.

Please trim your posts and remove extraneous junk at the end.



John
--
Those people who think they know everything are a great
annoyance to those of us who do.-- Isaac Asimov

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Perl code for comparing two files

2009-05-08 Thread Richard Loveland
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mr. Adhikary,

The following will take any number of files as arguments, in the format
you described (I even tested it! :-)). It goes through each line of
those files, stuffing (the relevant part of) each line in a 'seen' hash
(more on that, and other, hash techniques here if you're interested:
http://www.perl.com/pub/a/2006/11/02/all-about-hashes.html).

The code below does not keep track of line numbers as you requested, but
I think the hash technique used here could help you as you approach a
solution to your particular problem.


#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp; # This is where 'read_file' lives

my %seen;

for my $arg ( @ARGV ) {
my @lines = read_file( $arg );
for my $line ( @lines ) {
chomp $line;
my @elems = split / /, $line;
my $value = $elems[1];
$seen{$value}++;
}
}

for my $k ( keys %seen ) {
print $k, \n if $seen{$k}  1;
}


Regards,
Rich Loveland


Anirban Adhikary wrote:
 Hi List
 I am writing a perl code which will takes 2 more files as argument. Then It
 will check the the values of each line of a file with respect with another
 file. If some value matches then it will write the value along with line
 number to another ( say outputfile) file.
 
 The source files are as follow
 
 Contents of abc.txt
 1 2325278241,P0
 2 2296250723,MH
 3 2296250724,MH
 4 2325277178,P0
 5 7067023316,WL
 6 7067023329,WL
 7 2296250759,MH
 8 7067023453,WL
 9 7067023455,WL
 10 555413,EA05
 ###
 Contents of xyz.txt
 1 7067023453,WL
 2 31-DEC-27,2O,7038590671
 3 31-DEC-27,2O,7038596464
 4 31-DEC-27,2O,7038596482
 5 2296250724,MH
 6 31-DEC-27,2O,7038597632
 7 31-DEC-27,2O,7038589511
 8 31-DEC-11,2O,7038590671
 9 7067023455,WL
 10 31-DEC-27,2O,7038555744
 ###
 Contents of pqr.txt
 1 2325278241,P0
 2 7067023316,WL
 3 7067023455,WL
 4 2296250724,MH
 
 
 
 
 
 
 For this requirement I have written the following code which works fine for
 2 input files
 
 use strict;
 use warnings;
 
 use Benchmark;
 
 if(@ARGV  2) {
 print Please enter atleast two or more  .orig file names \n;
 exit 0;
 }
 my @file_names = @ARGV;
 chomp(@file_names);
 my @files_to_process;
 
 for(@file_names) {
 if( -s $_){
 print File $_ exists\n;
 push(@files_to_process,$_);
 }
 elsif( -e $_) {
 print File $_ exists but it has zero byte size\n;
 }
 else {
 print File $_ does not exists \n;
 }
 }
 
 my $count = @files_to_process;
 if( $count  2 ) {
 print Atleast 2 .orig files are required to continue this
 program\n;
 exit 0;
 }
 
 my $output_file = outputfile;
 my $value = 0;
 my $start_time = new Benchmark;
 
 
 if( $count = 2 ) {
 while ($count) {
 my ($files_after_processing_pointer,$return_val) =
 create_itermediate_file (\...@files_to_process,$value);
 my @files_after_processing =
 @$files_after_processing_pointer;
 $count = @files_after_processing;
 $value = $return_val;
 @files_to_process = @files_after_processing;
 
 }
 
 my $end_time = new Benchmark;
 my $difference = timediff($end_time, $start_time);
 print It took , timestr($difference),  to execute the program\n;
 
 }
 
 
 
 
 sub create_itermediate_file {
 my $file_pointer = $_[0];
 my $counter = $_[1];
 my @file_content = @$file_pointer;
 
 if($counter == 0) {
 my($first_file,$second_file) = splice
 (@file_content, 0, 2);
 open my $orig_first, , $first_file
 or die could not open $first_file: $!;
 open my $orig_second, , $second_file
 or die could not open $second_file:
 $!;
 open my $output_fh, , $output_file
 or die could not open $output_file:
 $!;
 
 my %content_first;
 while  (my $line = $orig_first) {
 chomp $line;
 if ($line) {
 
 my($line_num,$value) = split( ,$line);
 
 $content_first{$value} = $line_num;
 }
 }
 
 my %content_second;
 while (my $line = $orig_second) {
 chomp $line;
 if ($line) {
 
 my($line_num,$value) = 

Perl code for comparing two files

2009-05-04 Thread Anirban Adhikary
Hi List
I am writing a perl code which will takes 2 more files as argument. Then It
will check the the values of each line of a file with respect with another
file. If some value matches then it will write the value along with line
number to another ( say outputfile) file.

The source files are as follow

Contents of abc.txt
1 2325278241,P0
2 2296250723,MH
3 2296250724,MH
4 2325277178,P0
5 7067023316,WL
6 7067023329,WL
7 2296250759,MH
8 7067023453,WL
9 7067023455,WL
10 555413,EA05
###
Contents of xyz.txt
1 7067023453,WL
2 31-DEC-27,2O,7038590671
3 31-DEC-27,2O,7038596464
4 31-DEC-27,2O,7038596482
5 2296250724,MH
6 31-DEC-27,2O,7038597632
7 31-DEC-27,2O,7038589511
8 31-DEC-11,2O,7038590671
9 7067023455,WL
10 31-DEC-27,2O,7038555744
###
Contents of pqr.txt
1 2325278241,P0
2 7067023316,WL
3 7067023455,WL
4 2296250724,MH






For this requirement I have written the following code which works fine for
2 input files

use strict;
use warnings;

use Benchmark;

if(@ARGV  2) {
print Please enter atleast two or more  .orig file names \n;
exit 0;
}
my @file_names = @ARGV;
chomp(@file_names);
my @files_to_process;

for(@file_names) {
if( -s $_){
print File $_ exists\n;
push(@files_to_process,$_);
}
elsif( -e $_) {
print File $_ exists but it has zero byte size\n;
}
else {
print File $_ does not exists \n;
}
}

my $count = @files_to_process;
if( $count  2 ) {
print Atleast 2 .orig files are required to continue this
program\n;
exit 0;
}

my $output_file = outputfile;
my $value = 0;
my $start_time = new Benchmark;


if( $count = 2 ) {
while ($count) {
my ($files_after_processing_pointer,$return_val) =
create_itermediate_file (\...@files_to_process,$value);
my @files_after_processing =
@$files_after_processing_pointer;
$count = @files_after_processing;
$value = $return_val;
@files_to_process = @files_after_processing;

}

my $end_time = new Benchmark;
my $difference = timediff($end_time, $start_time);
print It took , timestr($difference),  to execute the program\n;

}




sub create_itermediate_file {
my $file_pointer = $_[0];
my $counter = $_[1];
my @file_content = @$file_pointer;

if($counter == 0) {
my($first_file,$second_file) = splice
(@file_content, 0, 2);
open my $orig_first, , $first_file
or die could not open $first_file: $!;
open my $orig_second, , $second_file
or die could not open $second_file:
$!;
open my $output_fh, , $output_file
or die could not open $output_file:
$!;

my %content_first;
while  (my $line = $orig_first) {
chomp $line;
if ($line) {

my($line_num,$value) = split( ,$line);

$content_first{$value} = $line_num;
}
}

my %content_second;
while (my $line = $orig_second) {
chomp $line;
if ($line) {

my($line_num,$value) = split( ,$line);

$content_second{$value} = $line_num;
}
}

foreach my $key (sort keys
%content_second) {
if (exists
$content_first{$key} ) {
print $output_fh
$content_second{$key} $key ,\n;
}
}
$counter += 1;
return (\...@file_content,$counter);
}
if ($counter != 0) {
my $file_pointer = $_[0];
my $counter = $_[1];
my @file_content_mod = @$file_pointer;
my($file_to_process) = shift(@file_content_mod);


open my $orig_file, , $file_to_process
or die could not open $file_to_process:
$!;
open my $output_fh, , $output_file

Problem in comparing two files using compare.pm

2007-08-29 Thread [EMAIL PROTECTED]
Hi,

Please find below my code to compare 2 files
source.txt,destination.txt in folder seek

#!perl

use File::Compare;

if(Compare(source.txt,destination.txt)==0)
{
print They're equal\n;
}

Please find below the error I am getting while running the code in
command prompt

D:\seekperl filematch.pl

syntax error at C:\Program Files\Perl\lib/File/Compare.pm line 3, near
use 5.00
5_64
BEGIN failed--compilation aborted at filematch.pl line 3.

Please help me to rectify this problem..

Note: In my other computer with perl tjis code works properly..

Regards,
Prabhu


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Problem in comparing two files using compare.pm

2007-08-29 Thread Stephen Kratzer
On Wednesday 29 August 2007 00:43:52 [EMAIL PROTECTED] wrote:
 Hi,

 Please find below my code to compare 2 files
 source.txt,destination.txt in folder seek

 #!perl

 use File::Compare;

 if(Compare(source.txt,destination.txt)==0)
   {
   print They're equal\n;
   }

 Please find below the error I am getting while running the code in
 command prompt

 D:\seekperl filematch.pl

 syntax error at C:\Program Files\Perl\lib/File/Compare.pm line 3, near
 use 5.00
 5_64
 BEGIN failed--compilation aborted at filematch.pl line 3.

 Please help me to rectify this problem..

 Note: In my other computer with perl tjis code works properly..

 Regards,
 Prabhu

The version of Perl on that computer is probably older than that required by 
the use statement.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing two files of 8million lines/rows ...

2006-08-17 Thread Mumia W.

On 08/16/2006 04:35 PM, [EMAIL PROTECTED] wrote:

Hi all,

I have two database tables, one is local and one is on a WAN. They are supposed
to be in-sync but they at the moment, they are not. There are 8million+ plus
rows on this table.

I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM
[EMAIL PROTECTED] ), leave that running for hours and all thru the night and 
guess
what, am still waiting for the result to come out ...

So what I decided is I want to extract the records into a flat file and then
write a Perl script to skim thru each line and check whether it exists on the
other file. While there will be 8million+ lines, the file will not be big
beacuse am only extracting one-column from the table.

Does anyone have an existing Perl code that does a similar thing like this
already? It will be much appreciated if you can send it to me and then I will
just modify it accordingly.

Example logic that I have is this:

FILE1:
MICKEY
MINNIE
DONALD
GOOFY
PLUTO

FILE2:
MICKEY
MINNIE
DONALD
GOOFY
PLUTO
BUGS-BUNNY

So search FILE1 for all line entries of FILE2 then output whoever does not exist
into FILE3. So after running the script, I should have ...

FILE3:
BUGS-BUNNY

What I currently have is that I read all of FILE2's entries into an array? Read
FILE1 one line at a time using m/// and if there is no m///, print that to
FILE3.

It seems to work fine for 1000 lines of entries, but am not particularly sure
how long will that take for 8million+ rows, not even sure if I can create an
array to contain 8million+ plus rows, if I can't, then am back to doing this on
the database instead. Another option am looking at is just to read FILE1 one
line at a time and do a grep $string_to_search FILE2 but I do not know how to
do a grep-like syntax against a file on Perl especially if the search string is
a variable.

Why I prefer using a script is so am not putting loads into the database not to
mention that I can put more logic into the script than on the SQL statement.

Any advise or other options will be very much appreciated  Thanks in
advance.






Obviously, you want to do this as efficiently as possible 
because of the humongous size of the data. Hashes are the 
fastest structures for letting you know if some data has 
already been seen.


Your problem is core. Do you have enough core memory to read 
all of the data one of the table columns into memory? If so, 
then the solution be almost trivial; if not, then it's 
probably not trivial, but also not hard.


My advice is to attempt to suck the entire column for one 
table into memory; FILE1 should become hash keys (with empty 
values). Then you would open FILE2, loop through the lines 
(records) and output any record that does not appear in the hash.


However, if you can't get the entire FILE1 into memory, then 
I'd suggest converting FILE1 into a berkeley database and 
using DB_File to tie it to a hash; from there on, the solution 
would be like the above.



HTH



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Comparing two files of 8million lines/rows ...

2006-08-17 Thread Rob Dixon

[EMAIL PROTECTED] wrote:

 I have two database tables, one is local and one is on a WAN. They are
 supposed to be in-sync but they at the moment, they are not. There are
 8million+ plus rows on this table.

 I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM
 [EMAIL PROTECTED] ), leave that running for hours and all thru the night and 
guess
 what, am still waiting for the result to come out ...

 So what I decided is I want to extract the records into a flat file and then
 write a Perl script to skim thru each line and check whether it exists on the
 other file. While there will be 8million+ lines, the file will not be big
 beacuse am only extracting one-column from the table.

 Does anyone have an existing Perl code that does a similar thing like this
 already? It will be much appreciated if you can send it to me and then I will
 just modify it accordingly.

 Example logic that I have is this:

 FILE1:
 MICKEY
 MINNIE
 DONALD
 GOOFY
 PLUTO

 FILE2:
 MICKEY
 MINNIE
 DONALD
 GOOFY
 PLUTO
 BUGS-BUNNY

 So search FILE1 for all line entries of FILE2 then output whoever does not
 exist into FILE3. So after running the script, I should have ...

 FILE3:
 BUGS-BUNNY

 What I currently have is that I read all of FILE2's entries into an array?
 Read FILE1 one line at a time using m/// and if there is no m///, print that
 to FILE3.

 It seems to work fine for 1000 lines of entries, but am not particularly sure
 how long will that take for 8million+ rows, not even sure if I can create an
 array to contain 8million+ plus rows, if I can't, then am back to doing this
 on the database instead. Another option am looking at is just to read FILE1
 one line at a time and do a grep $string_to_search FILE2 but I do not know
 how to do a grep-like syntax against a file on Perl especially if the search
 string is a variable.

 Why I prefer using a script is so am not putting loads into the database not
 to mention that I can put more logic into the script than on the SQL
 statement.

 Any advise or other options will be very much appreciated  Thanks in
 advance.

No need for flat files if you are using DBI. Read the EMPNO values from
[EMAIL PROTECTED] first as it is a) slower, being on the network, and b) a 
shorter list,
and put these values into a Perl hash. 8 million hash values are unlikely to
take more than about 300MB of memory I would guess, which should be fine on any
recent PC. Fetching one value at a time will prevent there being two copies of
the data in memory at once (an array of retrieved values and the derived hash).

Then just read each EMPNO from EMP and print it out if it isn't in the hash.
This code fragment may help.
Cheers,

Rob


use strict;
use warnings;

use DBI;

my ($dsn, $user, $pass);  # Assign these as appropriate
my $dbh = DBI-connect($dsn, $user, $pass);

my %employee;

my $sth = $dbh-prepare('SELECT EMPNO FROM [EMAIL PROTECTED]');
$sth-execute;
while (my ($empno) = $sth-fetchrow_array) {
  $employee{$empno}++;
}

$sth = $dbh-prepare('SELECT EMPNO FROM EMP');
$sth-execute;
while (my ($empno) = $sth-fetchrow_array) {
  print $empno\n unless $employee{$empno};
}


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Comparing two files of 8million lines/rows ...

2006-08-17 Thread JeeBee
Just an idea, don't know whether it's useful...
If you can get both files sorted (either by adding order to your sql
query that generates the file, or the commandline 'sort') the problem
becomes much more easy.
You'd just have to traverse each file, something like this:

read word_1 from file_1
read word_2 from file_2

while not at the end of the file

  if word_1  word_2
print: word_1 is not in file_2 !
read next word_1 from file_1

  elseif word_2  word_1
print: word_2 is not in file_1 !
read next word_2 from file_2

  else
read next word_1 from file_1
read next word_2 from file_2

end while


Well, the stopping criterium is not quite correct yet, but you'll get my
point.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Comparing two files of 8million lines/rows ...

2006-08-17 Thread Dr.Ruud
[EMAIL PROTECTED] schreef:

 SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO
 FROM [EMAIL PROTECTED] )

Or maybe use something like
select EMPNO as E1 from EMP left join [EMAIL PROTECTED] as E2 on E1.EMPNO =
E2.EMPNO where E2.EMPNO IS NULL
(untested)


 So search FILE1 for all line entries of FILE2 then output whoever
 does not exist

Look into `sort` and `diff` and `uniq -u`.


-- 
Affijn, Ruud

Gewoon is een tijger.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Comparing two files of 8million lines/rows ...

2006-08-16 Thread benbart
Hi all,

I have two database tables, one is local and one is on a WAN. They are supposed
to be in-sync but they at the moment, they are not. There are 8million+ plus
rows on this table.

I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM
[EMAIL PROTECTED] ), leave that running for hours and all thru the night and 
guess
what, am still waiting for the result to come out ...

So what I decided is I want to extract the records into a flat file and then
write a Perl script to skim thru each line and check whether it exists on the
other file. While there will be 8million+ lines, the file will not be big
beacuse am only extracting one-column from the table.

Does anyone have an existing Perl code that does a similar thing like this
already? It will be much appreciated if you can send it to me and then I will
just modify it accordingly.

Example logic that I have is this:

FILE1:
MICKEY
MINNIE
DONALD
GOOFY
PLUTO

FILE2:
MICKEY
MINNIE
DONALD
GOOFY
PLUTO
BUGS-BUNNY

So search FILE1 for all line entries of FILE2 then output whoever does not exist
into FILE3. So after running the script, I should have ...

FILE3:
BUGS-BUNNY

What I currently have is that I read all of FILE2's entries into an array? Read
FILE1 one line at a time using m/// and if there is no m///, print that to
FILE3.

It seems to work fine for 1000 lines of entries, but am not particularly sure
how long will that take for 8million+ rows, not even sure if I can create an
array to contain 8million+ plus rows, if I can't, then am back to doing this on
the database instead. Another option am looking at is just to read FILE1 one
line at a time and do a grep $string_to_search FILE2 but I do not know how to
do a grep-like syntax against a file on Perl especially if the search string is
a variable.

Why I prefer using a script is so am not putting loads into the database not to
mention that I can put more logic into the script than on the SQL statement.

Any advise or other options will be very much appreciated  Thanks in
advance.




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Comparing two files of 8million lines/rows ...

2006-08-16 Thread joseph

[EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Hi all,

Hello,


 I have two database tables, one is local and one is on a WAN. They are 
 supposed
 to be in-sync but they at the moment, they are not. There are 8million+ 
 plus
 rows on this table.

 I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM
 [EMAIL PROTECTED] ), leave that running for hours and all thru the night and 
 guess
 what, am still waiting for the result to come out ...

 So what I decided is I want to extract the records into a flat file and 
 then
 write a Perl script to skim thru each line and check whether it exists on 
 the
 other file. While there will be 8million+ lines, the file will not be big
 beacuse am only extracting one-column from the table.

 Does anyone have an existing Perl code that does a similar thing like this
 already? It will be much appreciated if you can send it to me and then I 
 will
 just modify it accordingly.

 Example logic that I have is this:

 FILE1:
 MICKEY
 MINNIE
 DONALD
 GOOFY
 PLUTO

 FILE2:
 MICKEY
 MINNIE
 DONALD
 GOOFY
 PLUTO
 BUGS-BUNNY


This will work, based on you given data

use strict;
use warnings;

my @file2 = qw/mickey minnie donald goofy pluto /;
my @file1 = qw /mickey minnie donald goofy pluto bunny/;
my %hash = map { $_ = undef } @file2;

while(@file1) {
  unless(exists $hash{$_}) {
print $_, \n;
}
  }

output:
bunny

caveats:
   This will only print out the element that were present on file1 and were 
not on file2,
and i'm also a beginner.

 So search FILE1 for all line entries of FILE2 then output whoever does not 
 exist
 into FILE3. So after running the script, I should have ...

 FILE3:
 BUGS-BUNNY

 What I currently have is that I read all of FILE2's entries into an array? 
 Read
 FILE1 one line at a time using m/// and if there is no m///, print that to
 FILE3.

 It seems to work fine for 1000 lines of entries, but am not particularly 
 sure
 how long will that take for 8million+ rows, not even sure if I can create 
 an
 array to contain 8million+ plus rows, if I can't, then am back to doing 
 this on
 the database instead. Another option am looking at is just to read FILE1 
 one
 line at a time and do a grep $string_to_search FILE2 but I do not know 
 how to
 do a grep-like syntax against a file on Perl especially if the search 
 string is
 a variable.

 Why I prefer using a script is so am not putting loads into the database 
 not to
 mention that I can put more logic into the script than on the SQL 
 statement.

 Any advise or other options will be very much appreciated  Thanks in
 advance.


welcome, HTH.

/joseph 



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Comparing two files

2003-07-22 Thread Cynthia Xun Liu
Hi,

Could anybody help me with the code of comparing files? I have two files
:
File A: name, info1, info2...
FileB: name, info1, info2...
I want to print out all the lines in File A with the same names as in
File B.
Thanks.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Comparing two files

2003-07-22 Thread LI NGOK LAM
First, I would ask, how many lines in each file ? under 100 ? above 1 ?
because that effect to choose the tatic for making things done.

Well, I assume there is reasonable to carry 1 names and each name not
longer then 20 character ( consumed about 200KB, still acceptable )
and I will do so :

use strict;
my (@names, @matches) ;

# Reading and record names into @names
open my $A, FileA;
while (my $line = $A)
{ my ($name, $waste ) = split /,/, $line, 2;
push (@names, $name)
}close $A;

# Find matching name in B and record them into @matches
open my $B, FileB;
while (my $line = $B)
{ my ($name, $waste )  = split /,/, $line, 2;
push (@matches, $name ) if ( grep /$name/, @names )
} close $B;

# Print results
print $_br\n for (@matches);
# Omit br if the printout is not returned as HTML format;

Remarks :
1. Code not been tested.
2. I suppose your line format is exactly same as you provided.

HTH






- Original Message - 
From: Cynthia Xun Liu [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, July 22, 2003 11:15 PM
Subject: Comparing two files


 Hi,

 Could anybody help me with the code of comparing files? I have two files
 :
 File A: name, info1, info2...
 FileB: name, info1, info2...
 I want to print out all the lines in File A with the same names as in
 File B.
 Thanks.


 -- 
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Comparing two files

2001-06-09 Thread Will W

If your system's memory is large enough to hold the smaller dataset,
then as others have said, working with hashes is the way to go:

read all of small dataset into hash
while another record in large dataset
if key for record exists in hash
delete hash{key}result is thus
an XOR of keys
else
 add record to hash
write hash as the output file

If the amount of satellite data is too great for this approach to work,
I would use a two stage approach by reading only the key fields and
using file1 and file2 as the appropriate hash values. Then build the
output file by looking up each record in described in the hash. There
may be more efficient ways to do this, but I like to keep my thoughts
simple.

Will




- Original Message -
From: Steve Whittle [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, June 06, 2001 8:46 AM
Subject: Comparing two files


 Hi,

 I'm trying to write a script that removes duplicates between two files
and
 writes the unique values to a new file. For example, have one file
with the
 following file 1:

 red
 green
 blue
 black
 grey

 and another file 2:

 black
 red

 and I want to create a new file that contains:

 green
 blue
 grey

 I have written a script that takes each entry in file 1 and then reads
 through file 2 to see if it exists there, if not, it writes it to a
new
 file. If there is a duplicate, nothing is written to the new file. The
real
 file 1 I'm dealing with has more than 2 million rows and the real file
2 has
 more than 100,000 rows so I don't think my method is very efficient.
I've
 looked through the web and perl references and can't find an easier
way. Am
 I missing something? Any ideas?

 Thanks,

 Steve Whittle






Re: Comparing two files

2001-06-09 Thread subbu cherukuwada

If you have ample of RAM, you may try using grep i.e.
open(SMALL,smallfile);
open(LARGE,largefile);
open(UNIQ,newfile);
while(LARGE) {
$lin=$_;
print UNIQ if(!(grep {^$lin$} SMALL));
seek(SMALL,0,0);
}





From: Will W [EMAIL PROTECTED]
Reply-To: Will W [EMAIL PROTECTED]
To: Steve Whittle [EMAIL PROTECTED], 
[EMAIL PROTECTED]
Subject: Re: Comparing two files
Date: Sat, 9 Jun 2001 07:35:05 -0700

If your system's memory is large enough to hold the smaller dataset,
then as others have said, working with hashes is the way to go:

 read all of small dataset into hash
 while another record in large dataset
 if key for record exists in hash
 delete hash{key}result is thus
an XOR of keys
 else
  add record to hash
 write hash as the output file

If the amount of satellite data is too great for this approach to work,
I would use a two stage approach by reading only the key fields and
using file1 and file2 as the appropriate hash values. Then build the
output file by looking up each record in described in the hash. There
may be more efficient ways to do this, but I like to keep my thoughts
simple.

Will




- Original Message -
From: Steve Whittle [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, June 06, 2001 8:46 AM
Subject: Comparing two files


  Hi,
 
  I'm trying to write a script that removes duplicates between two files
and
  writes the unique values to a new file. For example, have one file
with the
  following file 1:
 
  red
  green
  blue
  black
  grey
 
  and another file 2:
 
  black
  red
 
  and I want to create a new file that contains:
 
  green
  blue
  grey
 
  I have written a script that takes each entry in file 1 and then reads
  through file 2 to see if it exists there, if not, it writes it to a
new
  file. If there is a duplicate, nothing is written to the new file. The
real
  file 1 I'm dealing with has more than 2 million rows and the real file
2 has
  more than 100,000 rows so I don't think my method is very efficient.
I've
  looked through the web and perl references and can't find an easier
way. Am
  I missing something? Any ideas?
 
  Thanks,
 
  Steve Whittle
 
 


_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.




Comparing two files

2001-06-06 Thread Steve Whittle

Hi,

I'm trying to write a script that removes duplicates between two files and
writes the unique values to a new file. For example, have one file with the
following file 1:

red
green
blue
black
grey

and another file 2:

black
red

and I want to create a new file that contains:

green
blue
grey

I have written a script that takes each entry in file 1 and then reads
through file 2 to see if it exists there, if not, it writes it to a new
file. If there is a duplicate, nothing is written to the new file. The real
file 1 I'm dealing with has more than 2 million rows and the real file 2 has
more than 100,000 rows so I don't think my method is very efficient. I've
looked through the web and perl references and can't find an easier way. Am
I missing something? Any ideas?

Thanks,

Steve Whittle




Re: Comparing two files

2001-06-06 Thread Ondrej Par

Hi,

one approach is to sort that files first, and work with sorted files - you 
then need to read them only once.

Second approach is to load smaller file into memory - to create a has with 
something like 
while (FILE1) { chomp; $found1{$_}++; }
and then read second file and compare it:
while (FILE2) { 
chomp;
if ($found1{$_}) {
$found_both{$_}++;
} else {
print $_\n;
}
}
foreach (keys %found1) {
print $_\n unless $found_both{$_};
}
but this will consume lot of memory.


On Wednesday 06 June 2001 17:46, Steve Whittle wrote:
 Hi,

 I'm trying to write a script that removes duplicates between two files and
 writes the unique values to a new file. For example, have one file with the
 following file 1:

 red
 green
 blue
 black
 grey

 and another file 2:

 black
 red

 and I want to create a new file that contains:

 green
 blue
 grey

 I have written a script that takes each entry in file 1 and then reads
 through file 2 to see if it exists there, if not, it writes it to a new
 file. If there is a duplicate, nothing is written to the new file. The real
 file 1 I'm dealing with has more than 2 million rows and the real file 2
 has more than 100,000 rows so I don't think my method is very efficient.
 I've looked through the web and perl references and can't find an easier
 way. Am I missing something? Any ideas?

 Thanks,

 Steve Whittle

-- 
Ondrej Par
Internet Securities
Software Engineer
e-mail: [EMAIL PROTECTED]
Phone: +420 2 222 543 45 ext. 112




Re: Comparing two files

2001-06-06 Thread Bob Mangold

Reading the files into hashes is definitely the answer. If you need an example
look at section 5.11 of the Perl Cookbook.

-Bob

--- Eduard Grinvald [EMAIL PROTECTED] wrote:
 As suggested earlier for a similiar problem, dumping everything into a 
 single hash as keys and then printing that out seems simple and 
 efficient enough.
 
 __END__
 =sincerely, eduard grinvald
 =email: [EMAIL PROTECTED]
 =dev-email: [EMAIL PROTECTED]
 =dev-site: r-x-linux.sourceforge.net
 =icq:   114099136
 =PGP
 -BEGIN PGP PUBLIC KEY BLOCK-
 Version: PGPfreeware 6.5.8 for non-commercial use http://www.pgp.com
 
 mQGiBDr60ScRBADsdYfjVgLPiUU8kJvLx9rsONfx1K4wPAKLUCcFOyhmBvIT/EEY
 pE2PVoOjosUdlkGGGFo9BLUi7UHoTrL7NyupJ4yHCU8wQiSPYK2GuZn5+ishIUI3
 sDifAE4JKuLxtz2fdBcoimrFBfXQRwNrmIqFnA+ooP5GRrJxHpgAn6rvkwCg/0B8
 uIXmUlVF+nwVHS6T2fAjYrUEAIe8LmwDVOorcWDRtoUzzSToAhhMY5ZM02OoG464
 2SJzBtDo8ABcWCdddRjeEV3Mt5ohDoLXzH9N2LuOx0AEaWcC94a3y8pcGoF1Mbpq
 UFQxUeh1TrKJlaTG0qSCb5euoIFPt5trob93CsHxVd69h6WKi3xOf1jXNbXsfWoj
 b56jBACC8Mfhcpjtiw0KZRfqdrb5w11HNb0kP1Ma5mEKqsOBh8MlR9EyBrCji98B
 pilGYsaW9PCxaFhJDPPbO6hcERup/O7787+LVc1ZYdlJFq/APcvNZZvbrHB7+uYW
 eTI/Vmi7rB2ljE2mpEtms6RoiXqNF32xHDx2pSdSla/kqhPy2rQfRWR1YXJkIEdy
 aW52YWxkIDxlZzM0NEBueXUuZWR1PokATgQQEQIADgUCOvrRJwQLAwIBAhkBAAoJ
 EDDNAitGCH7xrjMAniAtflvrVvGegFgBYWv9f3eYFTQnAKDPJbKEjt2sOdRV1Ey5
 Yah5ScFZEbkEDQQ6+tEnEBAA+RigfloGYXpDkJXcBWyHhuxh7M1FHw7Y4KN5xsnc
 egus5D/jRpS2MEpT13wCFkiAtRXlKZmpnwd00//jocWWIE6YZbjYDe4QXau2FxxR
 2FDKIldDKb6V6FYrOHhcC9v4TE3V46pGzPvOF+gqnRRh44SpT9GDhKh5tu+Pp0NG
 CMbMHXdXJDhK4sTw6I4TZ5dOkhNh9tvrJQ4X/faY98h8ebByHTh1+/bBc8SDESYr
 Q2DD4+jWCv2hKCYLrqmus2UPogBTAaB81qujEh76DyrOH3SET8rzF/OkQOnX0ne2
 Qi0CNsEmy2henXyYCQqNfi3t5F159dSST5sYjvwqp0t8MvZCV7cIfwgXcqK61qlC
 8wXo+VMROU+28W65Szgg2gGnVqMU6Y9AVfPQB8bLQ6mUrfdMZIZJ+AyDvWXpF9Sh
 01D49Vlf3HZSTz09jdvOmeFXklnN/biudE/F/Ha8g8VHMGHOfMlm/xX5u/2RXscB
 qtNbno2gpXI61Brwv0YAWCvl9Ij9WE5J280gtJ3kkQc2azNsOA1FHQ98iLMcfFst
 jvbzySPAQ/ClWxiNjrtVjLhdONM0/XwXV0OjHRhs3jMhLLUq/zzhsSlAGBGNfISn
 CnLWhsQDGcgHKXrKlQzZlp+r0ApQmwJG0wg9ZqRdQZ+cfL2JSyIZJrqrol7DVes9
 1hcAAgIQAO6f9QVuw3eFMDxx4kVmw9pXlMcPlZT0xfBrzLkHwjoA0wdLp2W/ZWEC
 FKQl7EV8yh3bqchlIIKRMLp05+5wgyS4GKsxRaRn1vUcKtPIe+mUojjvwkbdrLAM
 TdZDVCwm1pxZqncCKrasJ8jtRT8kf93x3o1m0grVeldGukCvFl91gKXUv4vRT0/8
 12MzhrxTkycx+pmS95Ytv7zps827dm6pXtlsTw9L7XNXYVTzHRd9MlvQzSxYIh2w
 U0pwZfiNfYKKMOHjlQbHAjZtmuC7TpmOWaxDw0kqo7K6sYqo/bPs3dA5eA/0mfDl
 7M2nwUjReCy1/O9B2Mf3QwjW712Rqyh2l+7Yp+GgIvHMHnIClZcNGjYh3jALUM3+
 OenmcZfBzf6uuw9yFt/3bhoK/YkJJ6BIxx4q04TY93x28IqTt3l17omX+oOaDGS3
 7gNTE52LTMUUwD+ienXrsu5mPL8CzrhXQAMTekJhc/HpOhNGPsqIe9TbFuacxr7j
 OCJxPeeFGEO2NJ+ChCk0z3S1tWmDrM1gxdIxUbO1bp8hBm7LP9rKkMlEqpSJWdS8
 yJmv/9WtDUTZdEJzzFeF+rp2lZpYkI/+sqVJP884fJ9NTS0aXiO1GxQWOKoVE58H
 cyrO3hpYgMdC1oP4WZnEJwkLAN+4WnsF2DkKmkNvJRjAANSBTJF3iQBGBBgRAgAG
 BQI6+tEnAAoJEDDNAitGCH7xa7AAn0dybVrFf+QHtfgkAsRK3oXY+7gwAJ4sWtYC
 GuYw+8LgdC7Mp2ICim9MqA==
 =iAF5
 -END PGP PUBLIC KEY BLOCK-
 =cut
 
 
 - Original Message -
 From: Steve Whittle [EMAIL PROTECTED]
 Date: Wednesday, June 6, 2001 11:46 am
 Subject: Comparing two files
 
  Hi,
  
  I'm trying to write a script that removes duplicates between two 
  files and
  writes the unique values to a new file. For example, have one file 
  with the
  following file 1:
  
  red
  green
  blue
  black
  grey
  
  and another file 2:
  
  black
  red
  
  and I want to create a new file that contains:
  
  green
  blue
  grey
  
  I have written a script that takes each entry in file 1 and then reads
  through file 2 to see if it exists there, if not, it writes it to 
  a new
  file. If there is a duplicate, nothing is written to the new file. 
  The real
  file 1 I'm dealing with has more than 2 million rows and the real 
  file 2 has
  more than 100,000 rows so I don't think my method is very 
  efficient. I've
  looked through the web and perl references and can't find an 
  easier way. Am
  I missing something? Any ideas?
  
  Thanks,
  
  Steve Whittle
  
  
 


__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/



Re: Comparing two files

2001-06-06 Thread Gary Luther



Unless I am missing the point of the question, this seems to 
me like an Intersection of Arrays problem which is covered in every Perl book 
that I have seen under hashes.

Basically:

%seen=();
foreach (@array1) {
 $seen($_)=1;
}
$intersection=grep($seen($_), @array2);



---"They 
that can give up essential liberty  to obtain a little temporary 
safety  deserve neither liberty nor 
safety." 


-- Benjamin Franklin 
-R 
Gary LutherRR 
RR 
SAFRR RR UTABEGAS 2500 BroadwayRR 
RR 
Helena, MT 
59602 
[EMAIL PROTECTED]RR RR ULE 
!! RR 
RR 
Visit our website atRR 
RR 
http://www.safmt.org

BEGIN:VCARD
VERSION:2.1
X-GWTYPE:USER
FN:Gary Luther
TEL;WORK:0631
ORG:;Computer Center
TEL;PREF;FAX:(406) 444-0684
EMAIL;WORK;PREF;NGW:[EMAIL PROTECTED]
N:Luther;Gary
TITLE:Systems Administrator
END:VCARD