RE: How to reinvent grep in perl?

siegfried Thu, 04 Oct 2007 20:44:19 -0700

>
>siegfried wrote:
>> I need to search large amounts of source code and grep is not doing the
job.
>> The problem is that I keep matching stuff in the comments of the
>> C++/Java/Perl/Groovy/Javascript source code.
>>
>> Can someone give me some hints on where I might start on rewriting grep
in
>> perl so that it ignores the contents of /* and */ comments?
>
>Instead of rewriting grep, consider writing a comment filter.  Have it
>read from standard input and write to standard output; pipe the file
>that you want to grep into it, and pipe its output into grep.
>


Thanks, but if I am piping from stdin to stdout I see two problems:

(1) how do I implement the -n flags that tell me the line number and
file name where the matches are

(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?


The only way I can see to do this is to make three passes:

  Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.

  Pass #2: change all comments to spaces except new-lines
  Pass #3: search for the pattern and print the line it is found on 

Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?

So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)?  How can I write to a
string instead stdout and make an additional pass using the technique
described in "perldoc -q comments".

Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?

How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?

Thanks,
Siegfried




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

RE: How to reinvent grep in perl?

Reply via email to