Re: How to reinvent grep in perl?

John W. Krahn Fri, 05 Oct 2007 02:40:29 -0700

siegfried wrote:

?????? wrote:

siegfried wrote:


I need to search large amounts of source code and grep is not doing the
job. The problem is that I keep matching stuff in the comments of the
C++/Java/Perl/Groovy/Javascript source code.

Can someone give me some hints on where I might start on rewriting grep
in perl so that it ignores the contents of /* and */ comments?


Instead of rewriting grep, consider writing a comment filter.  Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.


Thanks, but if I am piping from stdin to stdout I see two problems:

(1) how do I implement the -n flags that tell me the line number and
file name where the matches are

(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?


The only way I can see to do this is to make three passes:

  Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.

  Pass #2: change all comments to spaces except new-lines

Pass #3: search for the pattern and print the line it is found on

Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?

So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)?  How can I write to a
string instead stdout and make an additional pass using the technique
described in "perldoc -q comments".

Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?

How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?



Here is one way to do what you want:


local $/;   # slurp whole file

while ( @ARGV ) {

    $_ = <>;   # get current file - puts file name in $ARGV

    # remove C/C++ comments (based on perlfaq)
    # and remove everything except newlines

s!(/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*)|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)!defined $3 ? $3 : do { ( my $x = $1 ) =~ y/\n//cd; $x } !gse;


    my $line_num;

    for my $line ( split /\n/ ) {

        ++$line_num;

        print "File name: $ARGV  Line number: $line_num  Line: $line\n"
            if $line =~ /grep pattern/;

        }

    }




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: How to reinvent grep in perl?

Reply via email to