Jan Eden <[EMAIL PROTECTED]> wrote:
: I had the following task: Open a file, read it and merge all
: pairs of lines containing a certain number of tabs. Example:
:
: Blablabla
: abc cab bca
: 123 453 756
: Blablabla
: Blablabla
:
: Here, lines 2 and three should be merged, while the other
: lines should remain untouched. Expected result:
:
: Blablabla
: abc 123
: cab 453
: bca 756
: Blablabla
: Blablabla
:
: While I managed to get this done, I doubt that I found a good
: (fast) solution. So before I move on to the large files which
: have to be processed, I'd like to get your input for a better
: solution.
I'm not certain fast and good are always the same thing in
programming. Certainly, a fast solution is preferred to a slow
solution. But if the fast solution doesn't scale or if it is
hard to understand it may not be a Good Thing.
From an input of:
Blablabla
abc cab bca
123 453 756
Blablabla
Blablabla
Blablabla
abc cab bca
123 453 756
Blablabla
Blablabla
From your explanation I expected to get:
Blablabla
abc 123
cab 453
bca 756
Blablabla
Blablabla
Blablabla
abc 123
cab 453
bca 756
Blablabla
Blablabla
When I ran the script I got the following output.
Blablabla
123 abc
453 cab
756 bca
Blablabla
Blablabla
Blablabla
123 abc
453 cab
756 bca
Blablabla
Blablabla
So which do you really want?
: This is how I did it:
It looks like a great first effort. You have
obviously picked up the basics.
: #!/usr/bin/perl -w
:
: use strict;
:
: my (@merge_one, @merge_two, @merge_three);
These don't need to be in this scope. True, the loop is
probably faster, but it isn't a Good Thing.
: open (FILE, "file.txt") or die "Cannot open the input file";
I like to include the perl error variable ($!)
in the 'die'. I also tend to use the same idiom to
open and close all files.
my $file = 'file.txt';
open FILE, $file or die qq(Cannot open "$file": $!);
my @input_file = <FILE>;
close FILE;
:
: my @input_file = <FILE>;
:
: foreach (0..$#input_file) {
: chomp $input_file[$_];
We could chomp the whole file at once by using this.
chomp( my @input_file = <FILE> );
close FILE;
foreach ( 0 .. $#input_file ) {
: my $next = $_+ 1;
Instead of remembering that $_ is the current line Let's
do this. Let's also not process the last line.
foreach my $current_line ( 0 .. $#input_file - 1 ) {
my $next_line = $current_line + 1;
: chomp $input_file[$next] if $input_file[$next];
We already chomped everything.
: if ($input_file[$_] =~ m/\t/ && $input_file[$next] =~ m/\t/) {
I get confused when reading across conditional lines.
This seems much more clear to me. YMMV.
if ( $input_file[$current_line] =~ m/\t/
&& $input_file[$next_line] =~ m/\t/ ) {
Actually, I tend to be indentation adverse. I would
probably skip to the next line on failure.
next unless
$input_file[$current_line] =~ m/\t/
&& $input_file[$next_line] =~ m/\t/;
: @merge_one = split /\t/, $input_file[$_];
: @merge_two = split /\t/, $input_file[$next];
: for ( 0 .. $#merge_two ) {
: $merge_three[$_] = $merge_two[$_] . " " . $merge_one[$_];
: }
: $input_file[$_] = join "\n", @merge_three;
: print $input_file[$_], "\n\n";
: $input_file[$next] = '';
: (@merge_one, @merge_two, @merge_three) = ();
: }
: }
I'd probably use a subroutine for the merge.
@input_file[$current_line, $next_line]
= merge( @input_file[$next_line, $current_line] );
}
sub merge {
my @first_line = split /\t/, $_[0];
my @next_line = split /\t/, $_[1];
my @merged = map
"$first_line[$_] $next_line[$_]",
0 .. $#first_line;
my $second = pop @merged;
my $first = join "\n", @merged;
return ( $first, $second );
}
That sub does just about the same that yours did,
It uses three arrays, but it doesn't add blank lines to
the output.
: my $output = join "\n", @input_file;
:
: open (OUTFILE, ">input_file2.txt");
: print OUTFILE $output;
I think you would be happier with a final newline
in the file.
print OUTFILE "$output\n"
Here's the re-write I did for your script.
my $file = 'in.txt';
open FH, $file or die qq(Cannot open "$file": $!);
chomp( my @input_file = <FH> );
close FH;
foreach my $current ( 0 .. $#input_file - 1 ) {
@input_file[$current, $current + 1] =
merge( @input_file[$current + 1, $current] )
if $input_file[$current ] =~ /\t/
&& $input_file[$current + 1] =~ /\t/;
}
$file = 'out.txt';
open FH, ">$file" or die qq(Cannot open "$file": $!);
print FH "$_\n" foreach @input_file;
close FH;
sub merge {
my @first_fields = split /\t/, $_[0];
my @next_fields = split /\t/, $_[1];
my @merged = map
"$first_fields[$_] $next_fields[$_]",
0 .. $#first_fields;
my $second = pop @merged;
my $first = join "\n", @merged;
return ( $first, $second );
}
__END__
To speed the algorithm I would play with the
merge sub. For example this does the same thing
with one array. It's a little less readable however.
You can use Benchmark.pm to compare speeds.
sub merge {
my @merged = split /\t/, $_[0];
@merged = map
shift( @merged ) . " $_",
split /\t/, $_[1];
my $merged = join( $/, @merged[0 .. $#merged - 1] );
return ( $merged, $merged[-1] );
}
In production code I'd probably use the first
sub above. I think it would easier to understand in
6 or 8 months.
HTH,
Charles K. Clarkson
--
Mobile Homes Specialist
254 968-8328
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>