> >siegfried wrote: >> I need to search large amounts of source code and grep is not doing the job. >> The problem is that I keep matching stuff in the comments of the >> C++/Java/Perl/Groovy/Javascript source code. >> >> Can someone give me some hints on where I might start on rewriting grep in >> perl so that it ignores the contents of /* and */ comments? > >Instead of rewriting grep, consider writing a comment filter. Have it >read from standard input and write to standard output; pipe the file >that you want to grep into it, and pipe its output into grep. >
Thanks, but if I am piping from stdin to stdout I see two problems: (1) how do I implement the -n flags that tell me the line number and file name where the matches are (2) how do I make two passes: one to strip out the comments (and preserve the original line breaks so I don't screw up the line numbers) and the other to actually search for what I am looking for? The only way I can see to do this is to make three passes: Pass #1: prepend the file name and current line number on to the beginning of each line (is there a way to interrogate stdin to get the file name?) So on a path with a long file and path name, that could easly double the memory requirement to store all that stuff redundantly on each line. Pass #2: change all comments to spaces except new-lines Pass #3: search for the pattern and print the line it is found on Now I could do this with pipes and 3 different instances of perl running at the same time. Is there a better way? So am I concerned about memory problems? The worst files are 16K lines long and consume a megabyte. I'm running windows with 2GB RAM. Should I be concerned about making multiple in memory passes over a 1MB string (that becomes a 3MB string after I prepend the file name and line number to the beginning of every line)? How can I write to a string instead stdout and make an additional pass using the technique described in "perldoc -q comments". Now I have queried this mailing list previously when I had a scraper that ran for six hours scraping web sites. If I recall correctly, perl's memory management was a bit of a problem. Will perl recycle my memory properly if I keep using the same 3MB string variables over and over again? How do I read an entire file into a string? I know how to do it record by record. Is there a more efficient way? Thanks, Siegfried -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/