Re: Find regex in Start/Stop segments

Tassilo von Parseval Wed, 11 Jun 2003 01:55:28 -0700

On Tue, Jun 10, 2003 at 11:49:25PM -0700 Harry Putnam wrote:

> I use a homeboy data base technique to keep info about the scripts I
> write and other typse of stuff too.  Here I'm just dealing with
> scripts.
> 
> Its a simple format to enter key information about what a script
> does.  Looks like:
> 
> # Keywords: SOME WORDS
> # body
> # body
> # DATE
> # &&
> 
> I've written various scripts to search this format in awk and shell.
> Now trying it in perl.  I have several working scripts but wanted to
> get some ideas from the sharp shooters here how to do this better.
> 
> My technique seems like it could be streamlined and improved quite a
> lot.


Yes, it's a little wordy considering it's Perl.

> The sample below just handles the basic technique and isn't completed
> with all tests and etc.  Just some basic ones. But really I'm more
> interested in hearing better ways to accomplish this.
> 
> The basic task is to locate a formated segment, search its keywords
> line for regex then print the segment.  Also a basic check for
> misformatted segments.
> 
> Not too concerned with how the files are aquired but what comes after.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> #!/usr/local/bin/perl -w
> 
> ($myscript = $0) =~ s:^.*/::; 

You are allowed to manipulate $0, too. The new value of $0 is the one
that is eventually showing up in your process-table (unless you are
using Perl5.8.0 where this does not work due to a bug).

> $regex = shift;
> ## Set Keywords start end regex for non script searching (The default)
> $keyreg = '^# Keywords:';
> $keyend = '^# &&$';
> 
> if (!$ARGV[0]) {
>   usage();
>   exit;
> }
> ## Aquire there files in whatever way
> @files = @ARGV;
> 
> ## Set a marker to know when we are in a new file
> $fname_for_line_cnt = '';
> for (@files) { 
>   chomp;

I don't think that the entries in @ARGV contain newlines at the end.
Actually I know they don't. :-)

>   $file = $_;
>   if ("$fname_for_line_cnt" eq "$file") {

There is no reason to put those variables into quotes.

>    ## This shouldn't happen
>     print "We're reading the same file again .. exiting\n";
>     exit;

That is better solved using a hash. Fill all the files into a hash (as
keys) and iterate over the keys. That way, it's guaranteed you only
inspect each file once.

>   } else {
>     ## Set lineno to 0 for start of each file
>     $lineno = 0;
>     $fname_for_line_cnt = $file;
>   }
> 
>   if (-f $file) {
>     open(FH,"<$file") or die "Cannot open $file: $!";
>     while (<FH>) {
>       chomp;
>       $lineno++;

You don't have to keep track of the line numbers yourself. Perl offers
the special variable $. for that.

>       $line = $_;
>       if (/$keyreg $regex/) {
>         print "$file\n";
>       $hit = "TRUE";
>       }
>       if ($hit) {
>       print "$lineno $line\n";
>       }
>       if ($hit && /$keyend/) {
>       ## We've hit the end of a good segment, print delimiter and null
>       ## out our vars
>       print "-- \n";
>       $hit        = '';
>       $line       = '';
>       }
>       if ($hit && /^[^#]/ || $hit && eof) {
>         ## If we see this situation it means the format is screwed up
>         ## Notify user of the line number, but null out vars and proceed. 
>       print  "$file:\n   INCOMPLETE SEGMENT ENTRY: Line <$lineno>\n --\n";
>       $hit        = '';
>       $line       = '';
>       }
>     }
>     close(FH);
>   } else {
>     next;
>   }
> }
> sub usage {
>   print<<EOM;
> 
> Purpose: Search scripts keyword segments (or any file)
> Usage: \`$myscript "REGEX" file ... fileN (or glob)'
>       (Where REGEX is a regex to be found in Keyword segment) 
> 
> EOM
> }

I'd probably write it like that:

    #!/usr/local/bin/perl -w
    use strict;
    
    $0 =~ s:.*/::;
    
    my $regex = shift;
    $regex = qr/^# Keywords: $regex/;   # could improve performance a little
    
    my %files;
    @files{ @ARGV } = ();               # a hash-slice: see 'perldoc perldata'
    
    usage(), exit if ! @ARGV;

    for my $file (keys %files) {
        next if ! -f $file;
        open FILE, "<", $file or die "Error opening $file: $!";

        my $hit;
        while (<FILE>) {        
            chomp;
            $hit++ if /$regex/o;        # start of record
            print "$. $_\n" if $hit;    # $. is the line number        
            $hit-- if /^# &&$/;         # end of record
            print "$file:\n\tINCOMPLETE SEGMENT ENTRY: Line <$l.>\n--\n"
                and $hit--
                if $hit && !/^#/ or $hit && eof;
        }
    }

    sub usage {
        ...
    }
       
I didn't test it but it should produce the same result as your script
and doing it considerably more quickly. Please substract any possible
syntax errors or logical flaws from the script before running it. ;-)

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Find regex in Start/Stop segments

Reply via email to