Re: How to reinvent grep in perl?
From: Chas. Owens [EMAIL PROTECTED] On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote: snip Chas shows one possibility. However, that approach generally involves slurping the entire file into the perl script, applying the regex to the whole thing, and then spitting the result out again. From what I understand, this generally isn't very good form. snip The reason slurping files is frowned upon is that you do not know how large the file may be. You may be testing with small files, but production may be using multi-gig files. Source files are almost always less than a few megabytes (and if they aren't you have a bigger problem than a slurp). If I found a source file that would be even just one megabyte, I'd first make sure it's really a source file and not a generated intermediate file and if it's not I'd go and start shouting at the person that created the file. (Yes, I do have a source file that's 1.08MB. It's generated.) In this case I'd check that the size is reasonable and slurp. It'll make the code much simpler. Jenda = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
On 10/8/07, Jenda Krynicky [EMAIL PROTECTED] wrote: From: Chas. Owens [EMAIL PROTECTED] On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote: snip Chas shows one possibility. However, that approach generally involves slurping the entire file into the perl script, applying the regex to the whole thing, and then spitting the result out again. From what I understand, this generally isn't very good form. snip The reason slurping files is frowned upon is that you do not know how large the file may be. You may be testing with small files, but production may be using multi-gig files. Source files are almost always less than a few megabytes (and if they aren't you have a bigger problem than a slurp). If I found a source file that would be even just one megabyte, I'd first make sure it's really a source file and not a generated intermediate file and if it's not I'd go and start shouting at the person that created the file. (Yes, I do have a source file that's 1.08MB. It's generated.) In this case I'd check that the size is reasonable and slurp. It'll make the code much simpler. snip Yeah, most source code files are smaller than 250k, but one to two megs covers the outliers. The point is that slurping is only bad if you don't know how big the files are going to be. If source code grows large enough to matter then you have more serious problems than a Perl script eating a lot of memory (also, if Perl can't slurp it, how do you edit it? with ed/edlin?). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
siegfried wrote: ?? wrote: siegfried wrote: I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? Instead of rewriting grep, consider writing a comment filter. Have it read from standard input and write to standard output; pipe the file that you want to grep into it, and pipe its output into grep. Thanks, but if I am piping from stdin to stdout I see two problems: (1) how do I implement the -n flags that tell me the line number and file name where the matches are (2) how do I make two passes: one to strip out the comments (and preserve the original line breaks so I don't screw up the line numbers) and the other to actually search for what I am looking for? The only way I can see to do this is to make three passes: Pass #1: prepend the file name and current line number on to the beginning of each line (is there a way to interrogate stdin to get the file name?) So on a path with a long file and path name, that could easly double the memory requirement to store all that stuff redundantly on each line. Pass #2: change all comments to spaces except new-lines Pass #3: search for the pattern and print the line it is found on Now I could do this with pipes and 3 different instances of perl running at the same time. Is there a better way? So am I concerned about memory problems? The worst files are 16K lines long and consume a megabyte. I'm running windows with 2GB RAM. Should I be concerned about making multiple in memory passes over a 1MB string (that becomes a 3MB string after I prepend the file name and line number to the beginning of every line)? How can I write to a string instead stdout and make an additional pass using the technique described in perldoc -q comments. Now I have queried this mailing list previously when I had a scraper that ran for six hours scraping web sites. If I recall correctly, perl's memory management was a bit of a problem. Will perl recycle my memory properly if I keep using the same 3MB string variables over and over again? How do I read an entire file into a string? I know how to do it record by record. Is there a more efficient way? Here is one way to do what you want: local $/; # slurp whole file while ( @ARGV ) { $_ = ; # get current file - puts file name in $ARGV # remove C/C++ comments (based on perlfaq) # and remove everything except newlines s!(/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*)|((\\.|[^\\])*|'(\\.|[^'\\])*'|.[^/'\\]*)! defined $3 ? $3 : do { ( my $x = $1 ) =~ y/\n//cd; $x } !gse; my $line_num; for my $line ( split /\n/ ) { ++$line_num; print File name: $ARGV Line number: $line_num Line: $line\n if $line =~ /grep pattern/; } } John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: How to reinvent grep in perl?
siegfried wrote: I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? Instead of rewriting grep, consider writing a comment filter. Have it read from standard input and write to standard output; pipe the file that you want to grep into it, and pipe its output into grep. Thanks, but if I am piping from stdin to stdout I see two problems: (1) how do I implement the -n flags that tell me the line number and file name where the matches are (2) how do I make two passes: one to strip out the comments (and preserve the original line breaks so I don't screw up the line numbers) and the other to actually search for what I am looking for? The only way I can see to do this is to make three passes: Pass #1: prepend the file name and current line number on to the beginning of each line (is there a way to interrogate stdin to get the file name?) So on a path with a long file and path name, that could easly double the memory requirement to store all that stuff redundantly on each line. Pass #2: change all comments to spaces except new-lines Pass #3: search for the pattern and print the line it is found on Now I could do this with pipes and 3 different instances of perl running at the same time. Is there a better way? So am I concerned about memory problems? The worst files are 16K lines long and consume a megabyte. I'm running windows with 2GB RAM. Should I be concerned about making multiple in memory passes over a 1MB string (that becomes a 3MB string after I prepend the file name and line number to the beginning of every line)? How can I write to a string instead stdout and make an additional pass using the technique described in perldoc -q comments. Now I have queried this mailing list previously when I had a scraper that ran for six hours scraping web sites. If I recall correctly, perl's memory management was a bit of a problem. Will perl recycle my memory properly if I keep using the same 3MB string variables over and over again? How do I read an entire file into a string? I know how to do it record by record. Is there a more efficient way? Thanks, Siegfried -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
siegfried wrote: Thanks, but if I am piping from stdin to stdout I see two problems: (1) how do I implement the -n flags that tell me the line number and file name where the matches are Well, as long as you're only piping one file at a time, the line number part isn't a problem; but I see your point otherwise. OK; forget the suggestion about a separate filter. (2) how do I make two passes: one to strip out the comments (and preserve the original line breaks so I don't screw up the line numbers) and the other to actually search for what I am looking for? If you slurp the entire file into a single string, you need to make two passes as you describe. Additionally, this makes the regexes used to purge comments even messier, since you need to modify them to preserve newlines within the comments. If you use the general approach that I outlined, where you're tackling each file on a line-by-line basis, you don't need to make two passes, per se. Instead, write any code or quotes that you find on a line into a string buffer as you find them, and apply the grep's regex to that buffer whenever you finish filtering a line: #pseudocode foreach $file (@ARGS) { if (open FILE, $file) { my $line = 0; my $context = 'code'; while (FILE) { $line++; ($text, $comment) = filter($_, \$context); print $file $line: $_\n if $text =~ $pattern; } close FILE; } } The magic is in how you write 'filter()'. See my previous post for a summary of the logic behind it. How do I read an entire file into a string? I know how to do it record by record. Is there a more efficient way? If you want to slurp an entire file into a single string: $string = join '', FILE; FILE will dump its contents into an anonymous list, which will then be joined together seamlessly. -- Jonathan Dataweaver Lang -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
How to reinvent grep in perl?
I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? Thanks, Siegfried -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
by strange coincidence, i just picked up where i had left off reading Minimal Perl by Tim Maher, which goes into some detail about replacing grep (among other functionalities) with either perl one-liners or scripts, depending on preference/need. this book may well be worth your time and money. On 10/3/07, siegfried [EMAIL PROTECTED] wrote: I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? Thanks, Siegfried -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ -- since this is a gmail account, please verify the mailing list is included in the reply to addresses -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
On 10/3/07, siegfried [EMAIL PROTECTED] wrote: I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? snip perldoc -q comments will get you halfway there -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote: snip Chas shows one possibility. However, that approach generally involves slurping the entire file into the perl script, applying the regex to the whole thing, and then spitting the result out again. From what I understand, this generally isn't very good form. snip The reason slurping files is frowned upon is that you do not know how large the file may be. You may be testing with small files, but production may be using multi-gig files. Source files are almost always less than a few megabytes (and if they aren't you have a bigger problem than a slurp). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep in perl?
siegfried wrote: I need to search large amounts of source code and grep is not doing the job. The problem is that I keep matching stuff in the comments of the C++/Java/Perl/Groovy/Javascript source code. Can someone give me some hints on where I might start on rewriting grep in perl so that it ignores the contents of /* and */ comments? Instead of rewriting grep, consider writing a comment filter. Have it read from standard input and write to standard output; pipe the file that you want to grep into it, and pipe its output into grep. As for the file itself: there's probably an elegant way to use regexes to trim out the comments; the 'perldoc -q comments' suggestion made by Chas shows one possibility. However, that approach generally involves slurping the entire file into the perl script, applying the regex to the whole thing, and then spitting the result out again. From what I understand, this generally isn't very good form. A messier approach that has the benefit of being less memory-intensive and of producing output with less of a delay would be to write a contextual switchboard: read the input stream a little at a time; decide if a given block of character is code, quote, or comment; and send it on to the output stream if it is code or quote. (The reason for distinguishing between code and quote is that '/*' and '*/' don't denote comments when they appear within quotes.) The key to this approach is the context. Start the program in 'code' context, and start reading the input stream a line at a time. For a given line, search for the first instance of characters that would denote the beginning of a comment or quote. If you don't find anything, send the whole line to the output stream; if you find something, send everything before it to the output stream, switch to quote or comment context as appropriate, and examine the rest of the line under the new context. Under quote context, do exactly the same as in code context, except that what you're looking for is the earliest character combination that will end the quote. In particular, you are _not_ looking for the start of a comment. When you find the end of the quote, you switch back to code context for the rest of the line. Comment context works almost exactly like quote context, except that you're looking exclusively for the end of the comment, and you throw away everything prior to it instead of sending it to the output stream. For the purpose of maintaining line counts, send a newline character to the output stream every time you hit the end of a line while in comment context. (Alternatively, you might consider writing the script to dump the comment contents to stderr, which could be useful if you ever want to grep the contents of the comments instead of the code. If you do this be sure to send a newline to stderr whenever you end a line in code or quote context.) -- Jonathan Dataweaver Lang -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: How to reinvent grep with perl? (OT: Cygwin grep)
Bakken, Luke [EMAIL PROTECTED] writes: Voila. That's most likely your problem - a mismatch between line endings and Cygwin mount point type. And in case you hadn't seen them before... there are at least a few sets of unix tools for dos/windows. Cygwin maybe the best known but I've used Uwin myself for sometime and never had a problem with its grep. http://www.research.att.com/sw/tools/uwin/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: How to reinvent grep with perl? (OT: Cygwin grep)
Voila. That's most likely your problem - a mismatch between line endings and Cygwin mount point type. And in case you hadn't seen them before... there are at least a few sets of unix tools for dos/windows. Cygwin maybe the best known but I've used Uwin myself for sometime and never had a problem with its grep. http://www.research.att.com/sw/tools/uwin/ And another suggestion: http://gnuwin32.sourceforge.net -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
How to reinvent grep with perl?
My man pages and info pages are not working well and I cannot figure out how to make grep search for a certain pattern. I even tried egrep and fgrep. So how do I reinvent grep with perl? Here is my attempt: perl -n -e 'print $. $_ if /^ *END *$/' *.f This works better than grep, except for the fact it does not print the file name. How can I make perl print the file file name? How do I do this with (f\|e\|)grep? - woops - that is off topic. Never mind then. Just, how do I print the file name using perl? Thanks, Siegfried
Re: How to reinvent grep with perl?
Siegfried Heintze wrote: My man pages and info pages are not working well and I cannot figure out how to make grep search for a certain pattern. I even tried egrep and fgrep. So how do I reinvent grep with perl? Here is my attempt: There's no need. When you do 'man whatever', you can hit '/', type a search term, and hit enter. To search that same term again, '/' then enter will do. This works on my Gentoo Linux box. -- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to reinvent grep with perl?
Siegfried Heintze wrote: Andrew, Thanks. When I hit n to go to the next page, it says No previous regular expression (press RETURN). So I can only display the first page. I have it expanded to the full screen but I still cannot see the portion of the display that tells me how to use extended regular expressions. Apparently the basic regular expressions don't include ^ and $. If your 'man' uses 'less' like mine does, you hit Space to go to the next page and use the arrow keys to scroll one line at a time. I haven't used 'info' in a while, but I believe you can search those with 's'. -- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to reinvent grep with perl?
Siegfried Heintze [EMAIL PROTECTED] writes: My man pages and info pages are not working well and I cannot figure out how to make grep search for a certain pattern. I even tried egrep and fgrep. So how do I reinvent grep with perl? Here is my attempt: perl -n -e 'print $. $_ if /^ *END *$/' *.f grep -n '^ *END *$' *.f -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to reinvent grep with perl?
Siegfried Heintze [EMAIL PROTECTED] writes: This works better than grep, except for the fact it does not print the file name. How can I make perl print the file file name? How is it better than grep? -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: How to reinvent grep with perl?
Perl works better than grep because the grep statement you give below does not find any instances of the pattern and perl finds quite a few. I'm using Cygwin on Win2003Server. Since there is something obviously wrong with my cygwin implementation of grep, how do I get the file names with perl? This, works, but it sure is ugly. Is there not an easier way to do this with perl? perl -e'@ARGV = (-) unless @ARGV; while(@ARGV){ $ARGV= shift @ARGV; if(!open(ARGV, $ARGV)){ warn Cannot open $ARGV: $!\n; next;} while (ARGV){ print $ARGV:$.:$_\n if/^ *END *$/; }}' *.f Thanks, Siegfried -Original Message- From: news [mailto:[EMAIL PROTECTED] On Behalf Of Harry Putnam Sent: Saturday, October 09, 2004 4:33 PM To: [EMAIL PROTECTED] Subject: Re: How to reinvent grep with perl? Siegfried Heintze [EMAIL PROTECTED] writes: My man pages and info pages are not working well and I cannot figure out how to make grep search for a certain pattern. I even tried egrep and fgrep. So how do I reinvent grep with perl? Here is my attempt: perl -n -e 'print $. $_ if /^ *END *$/' *.f grep -n '^ *END *$' *.f -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: How to reinvent grep with perl?
Siegfried Heintze [EMAIL PROTECTED] writes: This, works, but it sure is ugly. Is there not an easier way to do this with perl? perl -e'@ARGV = (-) unless @ARGV; while(@ARGV){ $ARGV= shift @ARGV; if(!open(ARGV, $ARGV)){ warn Cannot open $ARGV: $!\n; next;} while (ARGV){ print $ARGV:$.:$_\n if/^ *END *$/; }}' *.f Doesn't something simple like: perl -n -e 'print $ARGV\n$. $_ if /PATTERN/' *.f work for you. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response