Re: grab pattern from start and end block

Rob Dixon Fri, 12 Jul 2013 05:50:03 -0700

On 12/07/2013 12:44, Agnello George wrote:

hi


i have raw data that is like this in a flat file .

start
name:agnello
dob:2 april
address:123 street
end
start
name:babit
dob:13 april
address:3 street
end
start
name:ganesh
dob:1 april
address:23 street
end


i need to get the data in the following format

name:agnello, dob:23 april ,address:123 street
name:babit,dob:13 april,address:3 street
name:ganesh,dob:1 april,address:23 street

i came up with this , is there a better way to do this :
===============================
#!/usr/bin/perl

use strict;
use warnings;

open my $FH , 'data.txt' or die "cannot open file $!";
read $FH, my $string, -s $FH;
close($FH);


my @string = split ( /start/ , $string ) ;

my %data;

foreach  ( @string ) {
chomp;
next if /^$/ ;
s/^ $//g;
s/end//;

my @data = split(/\n/, "$_");
   foreach my $i (@data) {
    print "$i,";

      }
print "\n";
}


Hi Agnello

Your code (almost) works, but it isn't very Perlish. Here are some
comments that I hope will help you.

- Your open is good because it uses a lexical file handle, checks the
success of the open, and puts $! in the die string. But the file handle
identifier should be lower case (upper case is reserved for globals) and
you should have an open mode '<' for the second parameter.

- That is an unconventional way to read the entire file into memory.
Usually you would temporarily undefine the input record separator $/ and
just use <$fh> to read the entire file in one go.

- @string is a poor choice of identifier for a list of records.

- `next if /^$/` isn't very useful as it will have no effect unless
there are two `start` lines in succession.

- s/^ $//g deletes the whole record if it is a single space. That can't
happen. I think you meant s/^ // to delete spaces from the beginning of
the record. Better still is s/^\s+//, which will also delete the newline
remaining after `start `.

- It is almost never correct to put scalar variables on their own inside
double quotes. You want `split(/\n/, $_)` or, since $_ is the default
parameter, just `split /\n/`.

- $i is a poor choice of identifier for lines in a record. $i is usually
used for an array index.

Combining those, your program looks like this

use strict;
use warnings;

my $string;
{
  open my $fh, '<', 'data.txt' or die "cannot open file $!";
  local $/;
  $string = <$fh>;
}

my @records = split /start/, $string;

foreach (@records) {
  chomp;
  s/^\s+//g;
  s/end//;

  my @data = split /\n/;
  print "$_," for @data;

  print "\n";
}

As for how I would do it, I am always inclined to read data into an
internal Perl data structure and then output it again. That allows for
manipulation of the data before it is displayed and gives far more
control over the output.

This is what I would write. It keeps the data in each record in hash
%data, which is emptied when a `start` line is seen and printed when an
`end` line is seen.

use strict;
use warnings;

open my $fh, '<', 'data.txt' or die $!;

my %data;
while (<$fh>) {
  if (/start/) {
    %data = ();
  }
  elsif (/end/) {
    print join(',', map "$_:$data{$_}", qw/ name dob address /), "\n";
  }
  else {
    chomp;
    my ($k, $v) = split /:/;
    $data{$k} = $v if $v;
  }
}

giving the output

name:agnello,dob:2 april,address:123 street
name:babit,dob:13 april,address:3 street
name:ganesh,dob:1 april,address:23 street





--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: grab pattern from start and end block

Reply via email to