siegfried wrote:
?????? wrote:
siegfried wrote:
I need to search large amounts of source code and grep is not doing the
job. The problem is that I keep matching stuff in the comments of the
C++/Java/Perl/Groovy/Javascript source code.
Can someone give me some hints on where I might start on rewriting grep
in perl so that it ignores the contents of /* and */ comments?
Instead of rewriting grep, consider writing a comment filter. Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.
Thanks, but if I am piping from stdin to stdout I see two problems:
(1) how do I implement the -n flags that tell me the line number and
file name where the matches are
(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?
The only way I can see to do this is to make three passes:
Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.
Pass #2: change all comments to spaces except new-lines
Pass #3: search for the pattern and print the line it is found on
Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?
So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)? How can I write to a
string instead stdout and make an additional pass using the technique
described in "perldoc -q comments".
Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?
How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?
Here is one way to do what you want:
local $/; # slurp whole file
while ( @ARGV ) {
$_ = <>; # get current file - puts file name in $ARGV
# remove C/C++ comments (based on perlfaq)
# and remove everything except newlines
s!(/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*)|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)!
defined $3 ? $3 : do { ( my $x = $1 ) =~ y/\n//cd; $x } !gse;
my $line_num;
for my $line ( split /\n/ ) {
++$line_num;
print "File name: $ARGV Line number: $line_num Line: $line\n"
if $line =~ /grep pattern/;
}
}
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/