RE: Merging lines in a file

Charles K. Clarkson Sun, 29 Aug 2004 17:15:52 -0700

Jan Eden <[EMAIL PROTECTED]> wrote:

: I had the following task: Open a file, read it and merge all
: pairs of lines containing a certain number of tabs. Example:
: 
: Blablabla
: abc cab bca
: 123 453 756
: Blablabla
: Blablabla
: 
: Here, lines 2 and three should be merged, while the other
: lines should remain untouched. Expected result:
: 
: Blablabla
: abc 123
: cab 453
: bca 756
: Blablabla
: Blablabla
: 
: While I managed to get this done, I doubt that I found a good
: (fast) solution. So before I move on to the large files which
: have to be processed, I'd like to get your input for a better
: solution.


    I'm not certain fast and good are always the same thing in
programming. Certainly, a fast solution is preferred to a slow
solution. But if the fast solution doesn't scale or if it is
hard to understand it may not be a Good Thing.


    From an input of:

Blablabla
abc     cab     bca
123     453     756
Blablabla
Blablabla
Blablabla
abc     cab     bca
123     453     756
Blablabla
Blablabla

    From your explanation I expected to get:

Blablabla
abc 123
cab 453
bca 756
Blablabla
Blablabla
Blablabla
abc 123
cab 453
bca 756
Blablabla
Blablabla

    When I ran the script I got the following output.

Blablabla
123 abc
453 cab
756 bca

Blablabla
Blablabla
Blablabla
123 abc
453 cab
756 bca

Blablabla
Blablabla

     So which do you really want?



: This is how I did it:

    It looks like a great first effort. You have
obviously picked up the basics.

 
: #!/usr/bin/perl -w
: 
: use strict;
: 
: my (@merge_one, @merge_two, @merge_three);

    These don't need to be in this scope. True, the loop is
probably faster, but it isn't a Good Thing.

 
: open (FILE, "file.txt") or die "Cannot open the input file";

    I like to include the perl error variable ($!)
in the 'die'. I also tend to use the same idiom to
open and close all files.

my $file = 'file.txt';
open FILE, $file or die qq(Cannot open "$file": $!);

my @input_file = <FILE>;

close FILE;

: 
: my @input_file = <FILE>;
: 
: foreach (0..$#input_file) {
:     chomp $input_file[$_];

    We could chomp the whole file at once by using this.

chomp( my @input_file = <FILE> );

close FILE;

foreach ( 0 .. $#input_file ) {


:     my $next = $_+ 1;

    Instead of remembering that $_ is the current line Let's
do this. Let's also not process the last line.

foreach my $current_line ( 0 .. $#input_file - 1 ) {

    my $next_line = $current_line + 1;


:     chomp $input_file[$next] if $input_file[$next];

    We already chomped everything.


:     if ($input_file[$_] =~ m/\t/ && $input_file[$next] =~ m/\t/) {

    I get confused when reading across conditional lines.
This seems much more clear to me. YMMV.

    if (    $input_file[$current_line] =~ m/\t/
        &&  $input_file[$next_line]    =~ m/\t/ ) {


    Actually, I tend to be indentation adverse. I would
probably skip to the next line on failure.

    next unless
            $input_file[$current_line] =~ m/\t/
        &&  $input_file[$next_line]    =~ m/\t/;


:         @merge_one = split /\t/, $input_file[$_];
:         @merge_two = split /\t/, $input_file[$next];
:         for ( 0 .. $#merge_two ) {
:             $merge_three[$_] = $merge_two[$_] . " " . $merge_one[$_];
:         }
:         $input_file[$_] = join "\n", @merge_three;
:         print $input_file[$_], "\n\n";
:         $input_file[$next] = '';
:         (@merge_one, @merge_two, @merge_three) = ();
:     }
: }
 

    I'd probably use a subroutine for the merge.

    @input_file[$current_line, $next_line]
        = merge( @input_file[$next_line, $current_line] );

}

sub merge {
    my @first_line = split /\t/, $_[0];
    my @next_line  = split /\t/, $_[1];

    my @merged = map
         "$first_line[$_] $next_line[$_]",
            0 .. $#first_line;

    my $second = pop @merged;
    my $first  = join "\n", @merged;

    return ( $first, $second );
}


    That sub does just about the same that yours did,
It uses three arrays, but it doesn't add blank lines to
the output.


: my $output = join "\n", @input_file;
:
: open (OUTFILE, ">input_file2.txt");
: print OUTFILE $output;

    I think you would be happier with a final newline
in the file.

print OUTFILE "$output\n"



    Here's the re-write I did for your script.

my $file = 'in.txt';
open FH, $file or die qq(Cannot open "$file": $!);

chomp( my @input_file = <FH> );

close FH;

foreach my $current ( 0 .. $#input_file - 1 ) {

    @input_file[$current, $current + 1] =

        merge( @input_file[$current + 1, $current] )

        if      $input_file[$current    ] =~ /\t/
            &&  $input_file[$current + 1] =~ /\t/;

}

$file = 'out.txt';

open FH, ">$file" or die qq(Cannot open "$file": $!);
print FH "$_\n" foreach @input_file;
close FH;


sub merge {
    my @first_fields = split /\t/, $_[0];
    my @next_fields  = split /\t/, $_[1];

    my @merged = map
         "$first_fields[$_] $next_fields[$_]",
            0 .. $#first_fields;

    my $second = pop @merged;
    my $first  = join "\n", @merged;

    return ( $first, $second );
}

__END__


    To speed the algorithm I would play with the
merge sub. For example this does the same thing
with one array. It's a little less readable however.
You can use Benchmark.pm to compare speeds.

sub merge {
    my @merged = split /\t/, $_[0];

    @merged = map
        shift( @merged ) . " $_",
            split /\t/, $_[1];

    my $merged = join( $/, @merged[0 .. $#merged - 1] );

    return ( $merged, $merged[-1] );
}

    In production code I'd probably use the first
sub above. I think it would easier to understand in
6 or 8 months.

HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

RE: Merging lines in a file

Reply via email to