RE: Re: Perl code for comparing two files
-Original Message- From: news [mailto:n...@ger.gmane.org] On Behalf Of Richard Loveland Sent: Friday, May 08, 2009 11:59 To: beginners@perl.org Subject: Re: Perl code for comparing two files -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mr. Adhikary, The following will take any number of files as arguments, in the format you described (I even tested it! :-)). It goes through each line of those files, stuffing (the relevant part of) each line in a 'seen' hash (more on that, and other, hash techniques here if you're interested: http://www.perl.com/pub/a/2006/11/02/all-about-hashes.html). The code below does not keep track of line numbers as you requested, but I think the hash technique used here could help you as you approach a solution to your particular problem. #!/usr/bin/perl use strict; use warnings; use File::Slurp; # This is where 'read_file' lives my %seen; for my $arg ( @ARGV ) { my @lines = read_file( $arg ); for my $line ( @lines ) { chomp $line; my @elems = split / /, $line; my $value = $elems[1]; $seen{$value}++; } } for my $k ( keys %seen ) { print $k, \n if $seen{$k} 1; } This is similar to above, but no File::Slurp and uses an hash combined with an array with [0] being the count of seen items, [ zero] is line number and index is the file it was from. I have given you a Data::Dumper. I ran with the fieles you provided. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %seen; my $MyLineNbr = 1; my %MFN = (); my $MyFilenames = \%MFN; my $MyFileCnt = 1; my $MyCurrFile = q[]; while ( ) { if ( $ARGV ne $MyCurrFile ) { printf Filename: %s (%d)\n, $ARGV, $MyFileCnt; $MyCurrFile = $ARGV; $MyFilenames-{$MyCurrFile} = $MyFileCnt++; $MyLineNbr = 0; } chomp; $MyLineNbr++; next if ( /^\s*$/ ); my @elems = split (/ /, $_); my $value = $elems[1]; $seen{$value}[0]++; $seen{$value}[$MyFilenames-{$MyCurrFile}] = $MyLineNbr; } print Dumper(\%seen); Regards, Rich Loveland Anirban Adhikary wrote: Hi List I am writing a perl code which will takes 2 more files as argument. Then It will check the the values of each line of a file with respect with another file. If some value matches then it will write the value along with line number to another ( say outputfile) file. The source files are as follow Contents of abc.txt 1 2325278241,P0 2 2296250723,MH 3 2296250724,MH 4 2325277178,P0 5 7067023316,WL 6 7067023329,WL 7 2296250759,MH 8 7067023453,WL 9 7067023455,WL 10 555413,EA05 ### Contents of xyz.txt 1 7067023453,WL 2 31-DEC-27,2O,7038590671 3 31-DEC-27,2O,7038596464 4 31-DEC-27,2O,7038596482 5 2296250724,MH 6 31-DEC-27,2O,7038597632 7 31-DEC-27,2O,7038589511 8 31-DEC-11,2O,7038590671 9 7067023455,WL 10 31-DEC-27,2O,7038555744 ### Contents of pqr.txt 1 2325278241,P0 2 7067023316,WL 3 7067023455,WL 4 2296250724,MH For this requirement I have written the following code which works fine for 2 input files use strict; use warnings; use Benchmark; if(@ARGV 2) { print Please enter atleast two or more .orig file names \n; exit 0; } my @file_names = @ARGV; chomp(@file_names); my @files_to_process; for(@file_names) { if( -s $_){ print File $_ exists\n; push(@files_to_process,$_); } elsif( -e $_) { print File $_ exists but it has zero byte size\n; } else { print File $_ does not exists \n; } } my $count = @files_to_process; if( $count 2 ) { print Atleast 2 .orig files are required to continue this program\n; exit 0; } my $output_file = outputfile; my $value = 0; my $start_time = new Benchmark; if( $count = 2 ) { while ($count) { my ($files_after_processing_pointer,$return_val) = create_itermediate_file (\...@files_to_process,$value); my @files_after_processing = @$files_after_processing_pointer; $count = @files_after_processing; $value = $return_val; @files_to_process = @files_after_processing; } my $end_time = new Benchmark; my $difference = timediff($end_time, $start_time); print It took , timestr($difference), to execute the program\n; } sub create_itermediate_file { my $file_pointer = $_[0]; my $counter = $_[1]; my @file_content = @$file_pointer
RE: Perl code for comparing two files
-Original Message- From: Anirban Adhikary [mailto:anirban.adhik...@gmail.com] Sent: Monday, May 04, 2009 06:40 To: beginners@perl.org Subject: Perl code for comparing two files Hi List I am writing a perl code which will takes 2 more files as argument. Then It will check the the values of each line of a file with respect with another file. If some value matches then it will write the value along with line number to another ( say outputfile) file. The source files are as follow Contents of abc.txt 1 2325278241,P0 2 2296250723,MH 3 2296250724,MH 4 2325277178,P0 5 7067023316,WL 6 7067023329,WL 7 2296250759,MH 8 7067023453,WL 9 7067023455,WL 10 555413,EA05 ### Contents of xyz.txt 1 7067023453,WL 2 31-DEC-27,2O,7038590671 3 31-DEC-27,2O,7038596464 4 31-DEC-27,2O,7038596482 5 2296250724,MH 6 31-DEC-27,2O,7038597632 7 31-DEC-27,2O,7038589511 8 31-DEC-11,2O,7038590671 9 7067023455,WL 10 31-DEC-27,2O,7038555744 ### Contents of pqr.txt 1 2325278241,P0 2 7067023316,WL 3 7067023455,WL 4 2296250724,MH Here is a way where a 'seen' hash has two array elements: [0] - count, [1]: file number and line number for each seen item. Code starts on next line: use strict; use warnings; use Data::Dumper; my %seen; my $MyLineNbr = 1; my %MFN = (); my $MyFilenames = \%MFN; my $MyFileCnt = 1; my $MyCurrFile = q[]; my $MyActIdx = 1; while ( ) { if ( $ARGV ne $MyCurrFile ) { printf Filename: %s\n, $ARGV; $MyCurrFile = $ARGV; $MyFilenames-{$MyCurrFile} = $MyFileCnt++; $MyLineNbr = 0; } chomp; $MyLineNbr++; next if ( /^\s*$/ ); my @elems = split (/ /, $_); my $value = $elems[1]; $seen{$value}[0]++; $seen{$value}[$MyActIdx] .= $MyFilenames-{$MyCurrFile} . q[;] . $MyLineNbr. q[^]; } print Dumper(\%seen); ^--- code ends here I leave to you to get the output, but this should give you what need to work with. If you have any questions and/or problems, please let me know. Thanks. Wags ;) David R. Wagner Senior Programmer Analyst FedEx Freight 1.719.484.2097 TEL 1.719.484.2419 FAX 1.408.623.5963 Cell http://fedex.com/us For this requirement I have written the following code which works fine for 2 input files use strict; use warnings; use Benchmark; if(@ARGV 2) { print Please enter atleast two or more .orig file names \n; exit 0; } my @file_names = @ARGV; chomp(@file_names); my @files_to_process; for(@file_names) { if( -s $_){ print File $_ exists\n; push(@files_to_process,$_); } elsif( -e $_) { print File $_ exists but it has zero byte size\n; } else { print File $_ does not exists \n; } } my $count = @files_to_process; if( $count 2 ) { print Atleast 2 .orig files are required to continue this program\n; exit 0; } my $output_file = outputfile; my $value = 0; my $start_time = new Benchmark; if( $count = 2 ) { while ($count) { my ($files_after_processing_pointer,$return_val) = create_itermediate_file (\...@files_to_process,$value); my @files_after_processing = @$files_after_processing_pointer; $count = @files_after_processing; $value = $return_val; @files_to_process = @files_after_processing; } my $end_time = new Benchmark; my $difference = timediff($end_time, $start_time); print It took , timestr($difference), to execute the program\n; } sub create_itermediate_file { my $file_pointer = $_[0]; my $counter = $_[1]; my @file_content = @$file_pointer; if($counter == 0) { my($first_file,$second_file) = splice (@file_content, 0, 2); open my $orig_first, , $first_file or die could not open $first_file: $!; open my $orig_second, , $second_file or die could not open $second_file: $!; open my $output_fh, , $output_file or die could not open $output_file: $!; my %content_first; while (my $line = $orig_first) { chomp $line; if ($line) { my($line_num,$value) = split( ,$line); $content_first{$value} = $line_num
Re: Perl code for comparing two files
Wagner, David --- Senior Programmer Analyst --- CFS wrote: -Original Message- From: Anirban Adhikary [mailto:anirban.adhik...@gmail.com] Sent: Monday, May 04, 2009 06:40 To: beginners@perl.org Subject: Perl code for comparing two files Hi List I am writing a perl code which will takes 2 more files as argument. Then It will check the the values of each line of a file with respect with another file. If some value matches then it will write the value along with line number to another ( say outputfile) file. The source files are as follow Contents of abc.txt 1 2325278241,P0 2 2296250723,MH 3 2296250724,MH 4 2325277178,P0 5 7067023316,WL 6 7067023329,WL 7 2296250759,MH 8 7067023453,WL 9 7067023455,WL 10 555413,EA05 ### Contents of xyz.txt 1 7067023453,WL 2 31-DEC-27,2O,7038590671 3 31-DEC-27,2O,7038596464 4 31-DEC-27,2O,7038596482 5 2296250724,MH 6 31-DEC-27,2O,7038597632 7 31-DEC-27,2O,7038589511 8 31-DEC-11,2O,7038590671 9 7067023455,WL 10 31-DEC-27,2O,7038555744 ### Contents of pqr.txt 1 2325278241,P0 2 7067023316,WL 3 7067023455,WL 4 2296250724,MH Here is a way where a 'seen' hash has two array elements: [0] - count, [1]: file number and line number for each seen item. Code starts on next line: use strict; use warnings; use Data::Dumper; my %seen; my $MyLineNbr = 1; Why? Perl already has a built-in line number variable. my %MFN = (); my $MyFilenames = \%MFN; Why declare two variables when you are only using one? Why use a hash reference instead of just using a hash (like you do with %seen?) my $MyFileCnt = 1; my $MyCurrFile = q[]; my $MyActIdx = 1; This value never changes so why use a variable? Why no variable for index 0 of the array? while ( ) { if ( $ARGV ne $MyCurrFile ) { printf Filename: %s\n, $ARGV; Why not just: print Filename: $ARGV\n; $MyCurrFile = $ARGV; $MyFilenames-{$MyCurrFile} = $MyFileCnt++; Why not just: $MyFilenames-{$MyCurrFile}++; $MyLineNbr = 0; } chomp; $MyLineNbr++; next if ( /^\s*$/ ); my @elems = split (/ /, $_); my $value = $elems[1]; Why not just: my $value = ( split )[ 1 ]; $seen{$value}[0]++; Why does this array element use a literal number and the next line use a variable? $seen{$value}[$MyActIdx] .= $MyFilenames-{$MyCurrFile} . q[;] . $MyLineNbr. q[^]; Since you are using a HoA anyway just push() $MyFilenames-{$MyCurrFile};$MyLineNbr onto the array and the count will be the array in scalar context. } print Dumper(\%seen); ^--- code ends here I leave to you to get the output, but this should give you what need to work with. If you have any questions and/or problems, please let me know. Thanks. 200 lines trimmed. Please trim your posts and remove extraneous junk at the end. John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Perl code for comparing two files
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mr. Adhikary, The following will take any number of files as arguments, in the format you described (I even tested it! :-)). It goes through each line of those files, stuffing (the relevant part of) each line in a 'seen' hash (more on that, and other, hash techniques here if you're interested: http://www.perl.com/pub/a/2006/11/02/all-about-hashes.html). The code below does not keep track of line numbers as you requested, but I think the hash technique used here could help you as you approach a solution to your particular problem. #!/usr/bin/perl use strict; use warnings; use File::Slurp; # This is where 'read_file' lives my %seen; for my $arg ( @ARGV ) { my @lines = read_file( $arg ); for my $line ( @lines ) { chomp $line; my @elems = split / /, $line; my $value = $elems[1]; $seen{$value}++; } } for my $k ( keys %seen ) { print $k, \n if $seen{$k} 1; } Regards, Rich Loveland Anirban Adhikary wrote: Hi List I am writing a perl code which will takes 2 more files as argument. Then It will check the the values of each line of a file with respect with another file. If some value matches then it will write the value along with line number to another ( say outputfile) file. The source files are as follow Contents of abc.txt 1 2325278241,P0 2 2296250723,MH 3 2296250724,MH 4 2325277178,P0 5 7067023316,WL 6 7067023329,WL 7 2296250759,MH 8 7067023453,WL 9 7067023455,WL 10 555413,EA05 ### Contents of xyz.txt 1 7067023453,WL 2 31-DEC-27,2O,7038590671 3 31-DEC-27,2O,7038596464 4 31-DEC-27,2O,7038596482 5 2296250724,MH 6 31-DEC-27,2O,7038597632 7 31-DEC-27,2O,7038589511 8 31-DEC-11,2O,7038590671 9 7067023455,WL 10 31-DEC-27,2O,7038555744 ### Contents of pqr.txt 1 2325278241,P0 2 7067023316,WL 3 7067023455,WL 4 2296250724,MH For this requirement I have written the following code which works fine for 2 input files use strict; use warnings; use Benchmark; if(@ARGV 2) { print Please enter atleast two or more .orig file names \n; exit 0; } my @file_names = @ARGV; chomp(@file_names); my @files_to_process; for(@file_names) { if( -s $_){ print File $_ exists\n; push(@files_to_process,$_); } elsif( -e $_) { print File $_ exists but it has zero byte size\n; } else { print File $_ does not exists \n; } } my $count = @files_to_process; if( $count 2 ) { print Atleast 2 .orig files are required to continue this program\n; exit 0; } my $output_file = outputfile; my $value = 0; my $start_time = new Benchmark; if( $count = 2 ) { while ($count) { my ($files_after_processing_pointer,$return_val) = create_itermediate_file (\...@files_to_process,$value); my @files_after_processing = @$files_after_processing_pointer; $count = @files_after_processing; $value = $return_val; @files_to_process = @files_after_processing; } my $end_time = new Benchmark; my $difference = timediff($end_time, $start_time); print It took , timestr($difference), to execute the program\n; } sub create_itermediate_file { my $file_pointer = $_[0]; my $counter = $_[1]; my @file_content = @$file_pointer; if($counter == 0) { my($first_file,$second_file) = splice (@file_content, 0, 2); open my $orig_first, , $first_file or die could not open $first_file: $!; open my $orig_second, , $second_file or die could not open $second_file: $!; open my $output_fh, , $output_file or die could not open $output_file: $!; my %content_first; while (my $line = $orig_first) { chomp $line; if ($line) { my($line_num,$value) = split( ,$line); $content_first{$value} = $line_num; } } my %content_second; while (my $line = $orig_second) { chomp $line; if ($line) { my($line_num,$value) =
Perl code for comparing two files
Hi List I am writing a perl code which will takes 2 more files as argument. Then It will check the the values of each line of a file with respect with another file. If some value matches then it will write the value along with line number to another ( say outputfile) file. The source files are as follow Contents of abc.txt 1 2325278241,P0 2 2296250723,MH 3 2296250724,MH 4 2325277178,P0 5 7067023316,WL 6 7067023329,WL 7 2296250759,MH 8 7067023453,WL 9 7067023455,WL 10 555413,EA05 ### Contents of xyz.txt 1 7067023453,WL 2 31-DEC-27,2O,7038590671 3 31-DEC-27,2O,7038596464 4 31-DEC-27,2O,7038596482 5 2296250724,MH 6 31-DEC-27,2O,7038597632 7 31-DEC-27,2O,7038589511 8 31-DEC-11,2O,7038590671 9 7067023455,WL 10 31-DEC-27,2O,7038555744 ### Contents of pqr.txt 1 2325278241,P0 2 7067023316,WL 3 7067023455,WL 4 2296250724,MH For this requirement I have written the following code which works fine for 2 input files use strict; use warnings; use Benchmark; if(@ARGV 2) { print Please enter atleast two or more .orig file names \n; exit 0; } my @file_names = @ARGV; chomp(@file_names); my @files_to_process; for(@file_names) { if( -s $_){ print File $_ exists\n; push(@files_to_process,$_); } elsif( -e $_) { print File $_ exists but it has zero byte size\n; } else { print File $_ does not exists \n; } } my $count = @files_to_process; if( $count 2 ) { print Atleast 2 .orig files are required to continue this program\n; exit 0; } my $output_file = outputfile; my $value = 0; my $start_time = new Benchmark; if( $count = 2 ) { while ($count) { my ($files_after_processing_pointer,$return_val) = create_itermediate_file (\...@files_to_process,$value); my @files_after_processing = @$files_after_processing_pointer; $count = @files_after_processing; $value = $return_val; @files_to_process = @files_after_processing; } my $end_time = new Benchmark; my $difference = timediff($end_time, $start_time); print It took , timestr($difference), to execute the program\n; } sub create_itermediate_file { my $file_pointer = $_[0]; my $counter = $_[1]; my @file_content = @$file_pointer; if($counter == 0) { my($first_file,$second_file) = splice (@file_content, 0, 2); open my $orig_first, , $first_file or die could not open $first_file: $!; open my $orig_second, , $second_file or die could not open $second_file: $!; open my $output_fh, , $output_file or die could not open $output_file: $!; my %content_first; while (my $line = $orig_first) { chomp $line; if ($line) { my($line_num,$value) = split( ,$line); $content_first{$value} = $line_num; } } my %content_second; while (my $line = $orig_second) { chomp $line; if ($line) { my($line_num,$value) = split( ,$line); $content_second{$value} = $line_num; } } foreach my $key (sort keys %content_second) { if (exists $content_first{$key} ) { print $output_fh $content_second{$key} $key ,\n; } } $counter += 1; return (\...@file_content,$counter); } if ($counter != 0) { my $file_pointer = $_[0]; my $counter = $_[1]; my @file_content_mod = @$file_pointer; my($file_to_process) = shift(@file_content_mod); open my $orig_file, , $file_to_process or die could not open $file_to_process: $!; open my $output_fh, , $output_file
Problem in comparing two files using compare.pm
Hi, Please find below my code to compare 2 files source.txt,destination.txt in folder seek #!perl use File::Compare; if(Compare(source.txt,destination.txt)==0) { print They're equal\n; } Please find below the error I am getting while running the code in command prompt D:\seekperl filematch.pl syntax error at C:\Program Files\Perl\lib/File/Compare.pm line 3, near use 5.00 5_64 BEGIN failed--compilation aborted at filematch.pl line 3. Please help me to rectify this problem.. Note: In my other computer with perl tjis code works properly.. Regards, Prabhu -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Problem in comparing two files using compare.pm
On Wednesday 29 August 2007 00:43:52 [EMAIL PROTECTED] wrote: Hi, Please find below my code to compare 2 files source.txt,destination.txt in folder seek #!perl use File::Compare; if(Compare(source.txt,destination.txt)==0) { print They're equal\n; } Please find below the error I am getting while running the code in command prompt D:\seekperl filematch.pl syntax error at C:\Program Files\Perl\lib/File/Compare.pm line 3, near use 5.00 5_64 BEGIN failed--compilation aborted at filematch.pl line 3. Please help me to rectify this problem.. Note: In my other computer with perl tjis code works properly.. Regards, Prabhu The version of Perl on that computer is probably older than that required by the use statement. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing two files of 8million lines/rows ...
On 08/16/2006 04:35 PM, [EMAIL PROTECTED] wrote: Hi all, I have two database tables, one is local and one is on a WAN. They are supposed to be in-sync but they at the moment, they are not. There are 8million+ plus rows on this table. I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM [EMAIL PROTECTED] ), leave that running for hours and all thru the night and guess what, am still waiting for the result to come out ... So what I decided is I want to extract the records into a flat file and then write a Perl script to skim thru each line and check whether it exists on the other file. While there will be 8million+ lines, the file will not be big beacuse am only extracting one-column from the table. Does anyone have an existing Perl code that does a similar thing like this already? It will be much appreciated if you can send it to me and then I will just modify it accordingly. Example logic that I have is this: FILE1: MICKEY MINNIE DONALD GOOFY PLUTO FILE2: MICKEY MINNIE DONALD GOOFY PLUTO BUGS-BUNNY So search FILE1 for all line entries of FILE2 then output whoever does not exist into FILE3. So after running the script, I should have ... FILE3: BUGS-BUNNY What I currently have is that I read all of FILE2's entries into an array? Read FILE1 one line at a time using m/// and if there is no m///, print that to FILE3. It seems to work fine for 1000 lines of entries, but am not particularly sure how long will that take for 8million+ rows, not even sure if I can create an array to contain 8million+ plus rows, if I can't, then am back to doing this on the database instead. Another option am looking at is just to read FILE1 one line at a time and do a grep $string_to_search FILE2 but I do not know how to do a grep-like syntax against a file on Perl especially if the search string is a variable. Why I prefer using a script is so am not putting loads into the database not to mention that I can put more logic into the script than on the SQL statement. Any advise or other options will be very much appreciated Thanks in advance. Obviously, you want to do this as efficiently as possible because of the humongous size of the data. Hashes are the fastest structures for letting you know if some data has already been seen. Your problem is core. Do you have enough core memory to read all of the data one of the table columns into memory? If so, then the solution be almost trivial; if not, then it's probably not trivial, but also not hard. My advice is to attempt to suck the entire column for one table into memory; FILE1 should become hash keys (with empty values). Then you would open FILE2, loop through the lines (records) and output any record that does not appear in the hash. However, if you can't get the entire FILE1 into memory, then I'd suggest converting FILE1 into a berkeley database and using DB_File to tie it to a hash; from there on, the solution would be like the above. HTH -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Comparing two files of 8million lines/rows ...
[EMAIL PROTECTED] wrote: I have two database tables, one is local and one is on a WAN. They are supposed to be in-sync but they at the moment, they are not. There are 8million+ plus rows on this table. I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM [EMAIL PROTECTED] ), leave that running for hours and all thru the night and guess what, am still waiting for the result to come out ... So what I decided is I want to extract the records into a flat file and then write a Perl script to skim thru each line and check whether it exists on the other file. While there will be 8million+ lines, the file will not be big beacuse am only extracting one-column from the table. Does anyone have an existing Perl code that does a similar thing like this already? It will be much appreciated if you can send it to me and then I will just modify it accordingly. Example logic that I have is this: FILE1: MICKEY MINNIE DONALD GOOFY PLUTO FILE2: MICKEY MINNIE DONALD GOOFY PLUTO BUGS-BUNNY So search FILE1 for all line entries of FILE2 then output whoever does not exist into FILE3. So after running the script, I should have ... FILE3: BUGS-BUNNY What I currently have is that I read all of FILE2's entries into an array? Read FILE1 one line at a time using m/// and if there is no m///, print that to FILE3. It seems to work fine for 1000 lines of entries, but am not particularly sure how long will that take for 8million+ rows, not even sure if I can create an array to contain 8million+ plus rows, if I can't, then am back to doing this on the database instead. Another option am looking at is just to read FILE1 one line at a time and do a grep $string_to_search FILE2 but I do not know how to do a grep-like syntax against a file on Perl especially if the search string is a variable. Why I prefer using a script is so am not putting loads into the database not to mention that I can put more logic into the script than on the SQL statement. Any advise or other options will be very much appreciated Thanks in advance. No need for flat files if you are using DBI. Read the EMPNO values from [EMAIL PROTECTED] first as it is a) slower, being on the network, and b) a shorter list, and put these values into a Perl hash. 8 million hash values are unlikely to take more than about 300MB of memory I would guess, which should be fine on any recent PC. Fetching one value at a time will prevent there being two copies of the data in memory at once (an array of retrieved values and the derived hash). Then just read each EMPNO from EMP and print it out if it isn't in the hash. This code fragment may help. Cheers, Rob use strict; use warnings; use DBI; my ($dsn, $user, $pass); # Assign these as appropriate my $dbh = DBI-connect($dsn, $user, $pass); my %employee; my $sth = $dbh-prepare('SELECT EMPNO FROM [EMAIL PROTECTED]'); $sth-execute; while (my ($empno) = $sth-fetchrow_array) { $employee{$empno}++; } $sth = $dbh-prepare('SELECT EMPNO FROM EMP'); $sth-execute; while (my ($empno) = $sth-fetchrow_array) { print $empno\n unless $employee{$empno}; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Comparing two files of 8million lines/rows ...
Just an idea, don't know whether it's useful... If you can get both files sorted (either by adding order to your sql query that generates the file, or the commandline 'sort') the problem becomes much more easy. You'd just have to traverse each file, something like this: read word_1 from file_1 read word_2 from file_2 while not at the end of the file if word_1 word_2 print: word_1 is not in file_2 ! read next word_1 from file_1 elseif word_2 word_1 print: word_2 is not in file_1 ! read next word_2 from file_2 else read next word_1 from file_1 read next word_2 from file_2 end while Well, the stopping criterium is not quite correct yet, but you'll get my point. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Comparing two files of 8million lines/rows ...
[EMAIL PROTECTED] schreef: SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM [EMAIL PROTECTED] ) Or maybe use something like select EMPNO as E1 from EMP left join [EMAIL PROTECTED] as E2 on E1.EMPNO = E2.EMPNO where E2.EMPNO IS NULL (untested) So search FILE1 for all line entries of FILE2 then output whoever does not exist Look into `sort` and `diff` and `uniq -u`. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Comparing two files of 8million lines/rows ...
Hi all, I have two database tables, one is local and one is on a WAN. They are supposed to be in-sync but they at the moment, they are not. There are 8million+ plus rows on this table. I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM [EMAIL PROTECTED] ), leave that running for hours and all thru the night and guess what, am still waiting for the result to come out ... So what I decided is I want to extract the records into a flat file and then write a Perl script to skim thru each line and check whether it exists on the other file. While there will be 8million+ lines, the file will not be big beacuse am only extracting one-column from the table. Does anyone have an existing Perl code that does a similar thing like this already? It will be much appreciated if you can send it to me and then I will just modify it accordingly. Example logic that I have is this: FILE1: MICKEY MINNIE DONALD GOOFY PLUTO FILE2: MICKEY MINNIE DONALD GOOFY PLUTO BUGS-BUNNY So search FILE1 for all line entries of FILE2 then output whoever does not exist into FILE3. So after running the script, I should have ... FILE3: BUGS-BUNNY What I currently have is that I read all of FILE2's entries into an array? Read FILE1 one line at a time using m/// and if there is no m///, print that to FILE3. It seems to work fine for 1000 lines of entries, but am not particularly sure how long will that take for 8million+ rows, not even sure if I can create an array to contain 8million+ plus rows, if I can't, then am back to doing this on the database instead. Another option am looking at is just to read FILE1 one line at a time and do a grep $string_to_search FILE2 but I do not know how to do a grep-like syntax against a file on Perl especially if the search string is a variable. Why I prefer using a script is so am not putting loads into the database not to mention that I can put more logic into the script than on the SQL statement. Any advise or other options will be very much appreciated Thanks in advance. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Comparing two files of 8million lines/rows ...
[EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Hi all, Hello, I have two database tables, one is local and one is on a WAN. They are supposed to be in-sync but they at the moment, they are not. There are 8million+ plus rows on this table. I tried to do SELECT EMPNO FROM EMP WHERE EMPNO NOT IN ( SELECT EMPNO FROM [EMAIL PROTECTED] ), leave that running for hours and all thru the night and guess what, am still waiting for the result to come out ... So what I decided is I want to extract the records into a flat file and then write a Perl script to skim thru each line and check whether it exists on the other file. While there will be 8million+ lines, the file will not be big beacuse am only extracting one-column from the table. Does anyone have an existing Perl code that does a similar thing like this already? It will be much appreciated if you can send it to me and then I will just modify it accordingly. Example logic that I have is this: FILE1: MICKEY MINNIE DONALD GOOFY PLUTO FILE2: MICKEY MINNIE DONALD GOOFY PLUTO BUGS-BUNNY This will work, based on you given data use strict; use warnings; my @file2 = qw/mickey minnie donald goofy pluto /; my @file1 = qw /mickey minnie donald goofy pluto bunny/; my %hash = map { $_ = undef } @file2; while(@file1) { unless(exists $hash{$_}) { print $_, \n; } } output: bunny caveats: This will only print out the element that were present on file1 and were not on file2, and i'm also a beginner. So search FILE1 for all line entries of FILE2 then output whoever does not exist into FILE3. So after running the script, I should have ... FILE3: BUGS-BUNNY What I currently have is that I read all of FILE2's entries into an array? Read FILE1 one line at a time using m/// and if there is no m///, print that to FILE3. It seems to work fine for 1000 lines of entries, but am not particularly sure how long will that take for 8million+ rows, not even sure if I can create an array to contain 8million+ plus rows, if I can't, then am back to doing this on the database instead. Another option am looking at is just to read FILE1 one line at a time and do a grep $string_to_search FILE2 but I do not know how to do a grep-like syntax against a file on Perl especially if the search string is a variable. Why I prefer using a script is so am not putting loads into the database not to mention that I can put more logic into the script than on the SQL statement. Any advise or other options will be very much appreciated Thanks in advance. welcome, HTH. /joseph -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Comparing two files
Hi, Could anybody help me with the code of comparing files? I have two files : File A: name, info1, info2... FileB: name, info1, info2... I want to print out all the lines in File A with the same names as in File B. Thanks. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Comparing two files
First, I would ask, how many lines in each file ? under 100 ? above 1 ? because that effect to choose the tatic for making things done. Well, I assume there is reasonable to carry 1 names and each name not longer then 20 character ( consumed about 200KB, still acceptable ) and I will do so : use strict; my (@names, @matches) ; # Reading and record names into @names open my $A, FileA; while (my $line = $A) { my ($name, $waste ) = split /,/, $line, 2; push (@names, $name) }close $A; # Find matching name in B and record them into @matches open my $B, FileB; while (my $line = $B) { my ($name, $waste ) = split /,/, $line, 2; push (@matches, $name ) if ( grep /$name/, @names ) } close $B; # Print results print $_br\n for (@matches); # Omit br if the printout is not returned as HTML format; Remarks : 1. Code not been tested. 2. I suppose your line format is exactly same as you provided. HTH - Original Message - From: Cynthia Xun Liu [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, July 22, 2003 11:15 PM Subject: Comparing two files Hi, Could anybody help me with the code of comparing files? I have two files : File A: name, info1, info2... FileB: name, info1, info2... I want to print out all the lines in File A with the same names as in File B. Thanks. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Comparing two files
If your system's memory is large enough to hold the smaller dataset, then as others have said, working with hashes is the way to go: read all of small dataset into hash while another record in large dataset if key for record exists in hash delete hash{key}result is thus an XOR of keys else add record to hash write hash as the output file If the amount of satellite data is too great for this approach to work, I would use a two stage approach by reading only the key fields and using file1 and file2 as the appropriate hash values. Then build the output file by looking up each record in described in the hash. There may be more efficient ways to do this, but I like to keep my thoughts simple. Will - Original Message - From: Steve Whittle [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, June 06, 2001 8:46 AM Subject: Comparing two files Hi, I'm trying to write a script that removes duplicates between two files and writes the unique values to a new file. For example, have one file with the following file 1: red green blue black grey and another file 2: black red and I want to create a new file that contains: green blue grey I have written a script that takes each entry in file 1 and then reads through file 2 to see if it exists there, if not, it writes it to a new file. If there is a duplicate, nothing is written to the new file. The real file 1 I'm dealing with has more than 2 million rows and the real file 2 has more than 100,000 rows so I don't think my method is very efficient. I've looked through the web and perl references and can't find an easier way. Am I missing something? Any ideas? Thanks, Steve Whittle
Re: Comparing two files
If you have ample of RAM, you may try using grep i.e. open(SMALL,smallfile); open(LARGE,largefile); open(UNIQ,newfile); while(LARGE) { $lin=$_; print UNIQ if(!(grep {^$lin$} SMALL)); seek(SMALL,0,0); } From: Will W [EMAIL PROTECTED] Reply-To: Will W [EMAIL PROTECTED] To: Steve Whittle [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Comparing two files Date: Sat, 9 Jun 2001 07:35:05 -0700 If your system's memory is large enough to hold the smaller dataset, then as others have said, working with hashes is the way to go: read all of small dataset into hash while another record in large dataset if key for record exists in hash delete hash{key}result is thus an XOR of keys else add record to hash write hash as the output file If the amount of satellite data is too great for this approach to work, I would use a two stage approach by reading only the key fields and using file1 and file2 as the appropriate hash values. Then build the output file by looking up each record in described in the hash. There may be more efficient ways to do this, but I like to keep my thoughts simple. Will - Original Message - From: Steve Whittle [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, June 06, 2001 8:46 AM Subject: Comparing two files Hi, I'm trying to write a script that removes duplicates between two files and writes the unique values to a new file. For example, have one file with the following file 1: red green blue black grey and another file 2: black red and I want to create a new file that contains: green blue grey I have written a script that takes each entry in file 1 and then reads through file 2 to see if it exists there, if not, it writes it to a new file. If there is a duplicate, nothing is written to the new file. The real file 1 I'm dealing with has more than 2 million rows and the real file 2 has more than 100,000 rows so I don't think my method is very efficient. I've looked through the web and perl references and can't find an easier way. Am I missing something? Any ideas? Thanks, Steve Whittle _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
Comparing two files
Hi, I'm trying to write a script that removes duplicates between two files and writes the unique values to a new file. For example, have one file with the following file 1: red green blue black grey and another file 2: black red and I want to create a new file that contains: green blue grey I have written a script that takes each entry in file 1 and then reads through file 2 to see if it exists there, if not, it writes it to a new file. If there is a duplicate, nothing is written to the new file. The real file 1 I'm dealing with has more than 2 million rows and the real file 2 has more than 100,000 rows so I don't think my method is very efficient. I've looked through the web and perl references and can't find an easier way. Am I missing something? Any ideas? Thanks, Steve Whittle
Re: Comparing two files
Hi, one approach is to sort that files first, and work with sorted files - you then need to read them only once. Second approach is to load smaller file into memory - to create a has with something like while (FILE1) { chomp; $found1{$_}++; } and then read second file and compare it: while (FILE2) { chomp; if ($found1{$_}) { $found_both{$_}++; } else { print $_\n; } } foreach (keys %found1) { print $_\n unless $found_both{$_}; } but this will consume lot of memory. On Wednesday 06 June 2001 17:46, Steve Whittle wrote: Hi, I'm trying to write a script that removes duplicates between two files and writes the unique values to a new file. For example, have one file with the following file 1: red green blue black grey and another file 2: black red and I want to create a new file that contains: green blue grey I have written a script that takes each entry in file 1 and then reads through file 2 to see if it exists there, if not, it writes it to a new file. If there is a duplicate, nothing is written to the new file. The real file 1 I'm dealing with has more than 2 million rows and the real file 2 has more than 100,000 rows so I don't think my method is very efficient. I've looked through the web and perl references and can't find an easier way. Am I missing something? Any ideas? Thanks, Steve Whittle -- Ondrej Par Internet Securities Software Engineer e-mail: [EMAIL PROTECTED] Phone: +420 2 222 543 45 ext. 112
Re: Comparing two files
Reading the files into hashes is definitely the answer. If you need an example look at section 5.11 of the Perl Cookbook. -Bob --- Eduard Grinvald [EMAIL PROTECTED] wrote: As suggested earlier for a similiar problem, dumping everything into a single hash as keys and then printing that out seems simple and efficient enough. __END__ =sincerely, eduard grinvald =email: [EMAIL PROTECTED] =dev-email: [EMAIL PROTECTED] =dev-site: r-x-linux.sourceforge.net =icq: 114099136 =PGP -BEGIN PGP PUBLIC KEY BLOCK- Version: PGPfreeware 6.5.8 for non-commercial use http://www.pgp.com mQGiBDr60ScRBADsdYfjVgLPiUU8kJvLx9rsONfx1K4wPAKLUCcFOyhmBvIT/EEY pE2PVoOjosUdlkGGGFo9BLUi7UHoTrL7NyupJ4yHCU8wQiSPYK2GuZn5+ishIUI3 sDifAE4JKuLxtz2fdBcoimrFBfXQRwNrmIqFnA+ooP5GRrJxHpgAn6rvkwCg/0B8 uIXmUlVF+nwVHS6T2fAjYrUEAIe8LmwDVOorcWDRtoUzzSToAhhMY5ZM02OoG464 2SJzBtDo8ABcWCdddRjeEV3Mt5ohDoLXzH9N2LuOx0AEaWcC94a3y8pcGoF1Mbpq UFQxUeh1TrKJlaTG0qSCb5euoIFPt5trob93CsHxVd69h6WKi3xOf1jXNbXsfWoj b56jBACC8Mfhcpjtiw0KZRfqdrb5w11HNb0kP1Ma5mEKqsOBh8MlR9EyBrCji98B pilGYsaW9PCxaFhJDPPbO6hcERup/O7787+LVc1ZYdlJFq/APcvNZZvbrHB7+uYW eTI/Vmi7rB2ljE2mpEtms6RoiXqNF32xHDx2pSdSla/kqhPy2rQfRWR1YXJkIEdy aW52YWxkIDxlZzM0NEBueXUuZWR1PokATgQQEQIADgUCOvrRJwQLAwIBAhkBAAoJ EDDNAitGCH7xrjMAniAtflvrVvGegFgBYWv9f3eYFTQnAKDPJbKEjt2sOdRV1Ey5 Yah5ScFZEbkEDQQ6+tEnEBAA+RigfloGYXpDkJXcBWyHhuxh7M1FHw7Y4KN5xsnc egus5D/jRpS2MEpT13wCFkiAtRXlKZmpnwd00//jocWWIE6YZbjYDe4QXau2FxxR 2FDKIldDKb6V6FYrOHhcC9v4TE3V46pGzPvOF+gqnRRh44SpT9GDhKh5tu+Pp0NG CMbMHXdXJDhK4sTw6I4TZ5dOkhNh9tvrJQ4X/faY98h8ebByHTh1+/bBc8SDESYr Q2DD4+jWCv2hKCYLrqmus2UPogBTAaB81qujEh76DyrOH3SET8rzF/OkQOnX0ne2 Qi0CNsEmy2henXyYCQqNfi3t5F159dSST5sYjvwqp0t8MvZCV7cIfwgXcqK61qlC 8wXo+VMROU+28W65Szgg2gGnVqMU6Y9AVfPQB8bLQ6mUrfdMZIZJ+AyDvWXpF9Sh 01D49Vlf3HZSTz09jdvOmeFXklnN/biudE/F/Ha8g8VHMGHOfMlm/xX5u/2RXscB qtNbno2gpXI61Brwv0YAWCvl9Ij9WE5J280gtJ3kkQc2azNsOA1FHQ98iLMcfFst jvbzySPAQ/ClWxiNjrtVjLhdONM0/XwXV0OjHRhs3jMhLLUq/zzhsSlAGBGNfISn CnLWhsQDGcgHKXrKlQzZlp+r0ApQmwJG0wg9ZqRdQZ+cfL2JSyIZJrqrol7DVes9 1hcAAgIQAO6f9QVuw3eFMDxx4kVmw9pXlMcPlZT0xfBrzLkHwjoA0wdLp2W/ZWEC FKQl7EV8yh3bqchlIIKRMLp05+5wgyS4GKsxRaRn1vUcKtPIe+mUojjvwkbdrLAM TdZDVCwm1pxZqncCKrasJ8jtRT8kf93x3o1m0grVeldGukCvFl91gKXUv4vRT0/8 12MzhrxTkycx+pmS95Ytv7zps827dm6pXtlsTw9L7XNXYVTzHRd9MlvQzSxYIh2w U0pwZfiNfYKKMOHjlQbHAjZtmuC7TpmOWaxDw0kqo7K6sYqo/bPs3dA5eA/0mfDl 7M2nwUjReCy1/O9B2Mf3QwjW712Rqyh2l+7Yp+GgIvHMHnIClZcNGjYh3jALUM3+ OenmcZfBzf6uuw9yFt/3bhoK/YkJJ6BIxx4q04TY93x28IqTt3l17omX+oOaDGS3 7gNTE52LTMUUwD+ienXrsu5mPL8CzrhXQAMTekJhc/HpOhNGPsqIe9TbFuacxr7j OCJxPeeFGEO2NJ+ChCk0z3S1tWmDrM1gxdIxUbO1bp8hBm7LP9rKkMlEqpSJWdS8 yJmv/9WtDUTZdEJzzFeF+rp2lZpYkI/+sqVJP884fJ9NTS0aXiO1GxQWOKoVE58H cyrO3hpYgMdC1oP4WZnEJwkLAN+4WnsF2DkKmkNvJRjAANSBTJF3iQBGBBgRAgAG BQI6+tEnAAoJEDDNAitGCH7xa7AAn0dybVrFf+QHtfgkAsRK3oXY+7gwAJ4sWtYC GuYw+8LgdC7Mp2ICim9MqA== =iAF5 -END PGP PUBLIC KEY BLOCK- =cut - Original Message - From: Steve Whittle [EMAIL PROTECTED] Date: Wednesday, June 6, 2001 11:46 am Subject: Comparing two files Hi, I'm trying to write a script that removes duplicates between two files and writes the unique values to a new file. For example, have one file with the following file 1: red green blue black grey and another file 2: black red and I want to create a new file that contains: green blue grey I have written a script that takes each entry in file 1 and then reads through file 2 to see if it exists there, if not, it writes it to a new file. If there is a duplicate, nothing is written to the new file. The real file 1 I'm dealing with has more than 2 million rows and the real file 2 has more than 100,000 rows so I don't think my method is very efficient. I've looked through the web and perl references and can't find an easier way. Am I missing something? Any ideas? Thanks, Steve Whittle __ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/
Re: Comparing two files
Unless I am missing the point of the question, this seems to me like an Intersection of Arrays problem which is covered in every Perl book that I have seen under hashes. Basically: %seen=(); foreach (@array1) { $seen($_)=1; } $intersection=grep($seen($_), @array2); ---"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin -R Gary LutherRR RR SAFRR RR UTABEGAS 2500 BroadwayRR RR Helena, MT 59602 [EMAIL PROTECTED]RR RR ULE !! RR RR Visit our website atRR RR http://www.safmt.org BEGIN:VCARD VERSION:2.1 X-GWTYPE:USER FN:Gary Luther TEL;WORK:0631 ORG:;Computer Center TEL;PREF;FAX:(406) 444-0684 EMAIL;WORK;PREF;NGW:[EMAIL PROTECTED] N:Luther;Gary TITLE:Systems Administrator END:VCARD