Jan Eden <[EMAIL PROTECTED]> wrote: : I had the following task: Open a file, read it and merge all : pairs of lines containing a certain number of tabs. Example: : : Blablabla : abc cab bca : 123 453 756 : Blablabla : Blablabla : : Here, lines 2 and three should be merged, while the other : lines should remain untouched. Expected result: : : Blablabla : abc 123 : cab 453 : bca 756 : Blablabla : Blablabla : : While I managed to get this done, I doubt that I found a good : (fast) solution. So before I move on to the large files which : have to be processed, I'd like to get your input for a better : solution.
I'm not certain fast and good are always the same thing in programming. Certainly, a fast solution is preferred to a slow solution. But if the fast solution doesn't scale or if it is hard to understand it may not be a Good Thing. From an input of: Blablabla abc cab bca 123 453 756 Blablabla Blablabla Blablabla abc cab bca 123 453 756 Blablabla Blablabla From your explanation I expected to get: Blablabla abc 123 cab 453 bca 756 Blablabla Blablabla Blablabla abc 123 cab 453 bca 756 Blablabla Blablabla When I ran the script I got the following output. Blablabla 123 abc 453 cab 756 bca Blablabla Blablabla Blablabla 123 abc 453 cab 756 bca Blablabla Blablabla So which do you really want? : This is how I did it: It looks like a great first effort. You have obviously picked up the basics. : #!/usr/bin/perl -w : : use strict; : : my (@merge_one, @merge_two, @merge_three); These don't need to be in this scope. True, the loop is probably faster, but it isn't a Good Thing. : open (FILE, "file.txt") or die "Cannot open the input file"; I like to include the perl error variable ($!) in the 'die'. I also tend to use the same idiom to open and close all files. my $file = 'file.txt'; open FILE, $file or die qq(Cannot open "$file": $!); my @input_file = <FILE>; close FILE; : : my @input_file = <FILE>; : : foreach (0..$#input_file) { : chomp $input_file[$_]; We could chomp the whole file at once by using this. chomp( my @input_file = <FILE> ); close FILE; foreach ( 0 .. $#input_file ) { : my $next = $_+ 1; Instead of remembering that $_ is the current line Let's do this. Let's also not process the last line. foreach my $current_line ( 0 .. $#input_file - 1 ) { my $next_line = $current_line + 1; : chomp $input_file[$next] if $input_file[$next]; We already chomped everything. : if ($input_file[$_] =~ m/\t/ && $input_file[$next] =~ m/\t/) { I get confused when reading across conditional lines. This seems much more clear to me. YMMV. if ( $input_file[$current_line] =~ m/\t/ && $input_file[$next_line] =~ m/\t/ ) { Actually, I tend to be indentation adverse. I would probably skip to the next line on failure. next unless $input_file[$current_line] =~ m/\t/ && $input_file[$next_line] =~ m/\t/; : @merge_one = split /\t/, $input_file[$_]; : @merge_two = split /\t/, $input_file[$next]; : for ( 0 .. $#merge_two ) { : $merge_three[$_] = $merge_two[$_] . " " . $merge_one[$_]; : } : $input_file[$_] = join "\n", @merge_three; : print $input_file[$_], "\n\n"; : $input_file[$next] = ''; : (@merge_one, @merge_two, @merge_three) = (); : } : } I'd probably use a subroutine for the merge. @input_file[$current_line, $next_line] = merge( @input_file[$next_line, $current_line] ); } sub merge { my @first_line = split /\t/, $_[0]; my @next_line = split /\t/, $_[1]; my @merged = map "$first_line[$_] $next_line[$_]", 0 .. $#first_line; my $second = pop @merged; my $first = join "\n", @merged; return ( $first, $second ); } That sub does just about the same that yours did, It uses three arrays, but it doesn't add blank lines to the output. : my $output = join "\n", @input_file; : : open (OUTFILE, ">input_file2.txt"); : print OUTFILE $output; I think you would be happier with a final newline in the file. print OUTFILE "$output\n" Here's the re-write I did for your script. my $file = 'in.txt'; open FH, $file or die qq(Cannot open "$file": $!); chomp( my @input_file = <FH> ); close FH; foreach my $current ( 0 .. $#input_file - 1 ) { @input_file[$current, $current + 1] = merge( @input_file[$current + 1, $current] ) if $input_file[$current ] =~ /\t/ && $input_file[$current + 1] =~ /\t/; } $file = 'out.txt'; open FH, ">$file" or die qq(Cannot open "$file": $!); print FH "$_\n" foreach @input_file; close FH; sub merge { my @first_fields = split /\t/, $_[0]; my @next_fields = split /\t/, $_[1]; my @merged = map "$first_fields[$_] $next_fields[$_]", 0 .. $#first_fields; my $second = pop @merged; my $first = join "\n", @merged; return ( $first, $second ); } __END__ To speed the algorithm I would play with the merge sub. For example this does the same thing with one array. It's a little less readable however. You can use Benchmark.pm to compare speeds. sub merge { my @merged = split /\t/, $_[0]; @merged = map shift( @merged ) . " $_", split /\t/, $_[1]; my $merged = join( $/, @merged[0 .. $#merged - 1] ); return ( $merged, $merged[-1] ); } In production code I'd probably use the first sub above. I think it would easier to understand in 6 or 8 months. HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>