Re: How to reinvent grep in perl?

2007-10-08 Thread Jenda Krynicky
From: Chas. Owens [EMAIL PROTECTED]
 On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote:
 snip
  Chas shows one possibility.  However, that approach generally involves
  slurping the entire file into the perl script, applying the regex to
  the whole thing, and then spitting the result out again.  From what I
  understand, this generally isn't very good form.
 snip
 
 The reason slurping files is frowned upon is that you do not know how
 large the file may be.  You may be testing with small files, but
 production may be using multi-gig files.  Source files are almost
 always less than a few megabytes (and if they aren't you have a bigger
 problem than a slurp).

If I found a source file that would be even just one megabyte, I'd 
first make sure it's really a source file and not a generated 
intermediate file and if it's not I'd go and start shouting at the 
person that created the file. (Yes, I do have a source file that's 
1.08MB. It's generated.) In this case I'd check that the size is 
reasonable and slurp. It'll make the code much simpler.

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-08 Thread Chas. Owens
On 10/8/07, Jenda Krynicky [EMAIL PROTECTED] wrote:
 From: Chas. Owens [EMAIL PROTECTED]
  On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote:
  snip
   Chas shows one possibility.  However, that approach generally involves
   slurping the entire file into the perl script, applying the regex to
   the whole thing, and then spitting the result out again.  From what I
   understand, this generally isn't very good form.
  snip
 
  The reason slurping files is frowned upon is that you do not know how
  large the file may be.  You may be testing with small files, but
  production may be using multi-gig files.  Source files are almost
  always less than a few megabytes (and if they aren't you have a bigger
  problem than a slurp).

 If I found a source file that would be even just one megabyte, I'd
 first make sure it's really a source file and not a generated
 intermediate file and if it's not I'd go and start shouting at the
 person that created the file. (Yes, I do have a source file that's
 1.08MB. It's generated.) In this case I'd check that the size is
 reasonable and slurp. It'll make the code much simpler.
snip

Yeah, most source code files are smaller than 250k, but one to two
megs covers the outliers.  The point is that slurping is only bad if
you don't know how big the files are going to be.  If source code
grows large enough to matter then you have more serious problems than
a Perl script eating a lot of memory (also, if Perl can't slurp it,
how do you edit it? with ed/edlin?).

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-05 Thread John W. Krahn

siegfried wrote:



?? wrote:

siegfried wrote:


I need to search large amounts of source code and grep is not doing the
job. The problem is that I keep matching stuff in the comments of the
C++/Java/Perl/Groovy/Javascript source code.

Can someone give me some hints on where I might start on rewriting grep
in perl so that it ignores the contents of /* and */ comments?


Instead of rewriting grep, consider writing a comment filter.  Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.


Thanks, but if I am piping from stdin to stdout I see two problems:

(1) how do I implement the -n flags that tell me the line number and
file name where the matches are

(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?


The only way I can see to do this is to make three passes:

  Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.

  Pass #2: change all comments to spaces except new-lines
  Pass #3: search for the pattern and print the line it is found on 


Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?

So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)?  How can I write to a
string instead stdout and make an additional pass using the technique
described in perldoc -q comments.

Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?

How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?



Here is one way to do what you want:


local $/;   # slurp whole file

while ( @ARGV ) {

$_ = ;   # get current file - puts file name in $ARGV

# remove C/C++ comments (based on perlfaq)
# and remove everything except newlines

s!(/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*)|((\\.|[^\\])*|'(\\.|[^'\\])*'|.[^/'\\]*)! 
defined $3 ? $3 : do { ( my $x = $1 ) =~ y/\n//cd; $x } !gse;


my $line_num;

for my $line ( split /\n/ ) {

++$line_num;

print File name: $ARGV  Line number: $line_num  Line: $line\n
if $line =~ /grep pattern/;

}

}




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.-- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: How to reinvent grep in perl?

2007-10-04 Thread siegfried

siegfried wrote:
 I need to search large amounts of source code and grep is not doing the
job.
 The problem is that I keep matching stuff in the comments of the
 C++/Java/Perl/Groovy/Javascript source code.

 Can someone give me some hints on where I might start on rewriting grep
in
 perl so that it ignores the contents of /* and */ comments?

Instead of rewriting grep, consider writing a comment filter.  Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.


Thanks, but if I am piping from stdin to stdout I see two problems:

(1) how do I implement the -n flags that tell me the line number and
file name where the matches are

(2) how do I make two passes: one to strip out the comments (and
preserve the original line breaks so I don't screw up the line
numbers) and the other to actually search for what I am looking for?


The only way I can see to do this is to make three passes:

  Pass #1: prepend the file name and current line number on to the
beginning of each line (is there a way to interrogate stdin to get the
file name?) So on a path with a long file and path name, that could
easly double the memory requirement to store all that stuff
redundantly on each line.

  Pass #2: change all comments to spaces except new-lines
  Pass #3: search for the pattern and print the line it is found on 

Now I could do this with pipes and 3 different instances of perl
running at the same time. Is there a better way?

So am I concerned about memory problems? The worst files are 16K lines
long and consume a megabyte. I'm running windows with 2GB RAM. Should
I be concerned about making multiple in memory passes over a 1MB
string (that becomes a 3MB string after I prepend the file name and
line number to the beginning of every line)?  How can I write to a
string instead stdout and make an additional pass using the technique
described in perldoc -q comments.

Now I have queried this mailing list previously when I had a scraper
that ran for six hours scraping web sites. If I recall correctly,
perl's memory management was a bit of a problem. Will perl recycle my
memory properly if I keep using the same 3MB string variables over and
over again?

How do I read an entire file into a string? I know how to do it record
by record. Is there a more efficient way?

Thanks,
Siegfried




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-04 Thread Jonathan Lang
siegfried wrote:
 Thanks, but if I am piping from stdin to stdout I see two problems:

 (1) how do I implement the -n flags that tell me the line number and
 file name where the matches are

Well, as long as you're only piping one file at a time, the line
number part isn't a problem; but I see your point otherwise.  OK;
forget the suggestion about a separate filter.

 (2) how do I make two passes: one to strip out the comments (and
 preserve the original line breaks so I don't screw up the line
 numbers) and the other to actually search for what I am looking for?

If you slurp the entire file into a single string, you need to make
two passes as you describe.  Additionally, this makes the regexes used
to purge comments even messier, since you need to modify them to
preserve newlines within the comments.

If you use the general approach that I outlined, where you're tackling
each file on a line-by-line basis, you don't need to make two passes,
per se.  Instead, write any code or quotes that you find on a line
into a string buffer as you find them, and apply the grep's regex to
that buffer whenever you finish filtering a line:

  #pseudocode
  foreach $file (@ARGS) {
if (open FILE, $file) {
  my $line = 0;
  my $context = 'code';
  while (FILE) {
$line++;
($text, $comment) = filter($_, \$context);
print $file $line: $_\n if $text =~ $pattern;
  }
  close FILE;
}
  }

The magic is in how you write 'filter()'.  See my previous post for a
summary of the logic behind it.

 How do I read an entire file into a string? I know how to do it record
 by record. Is there a more efficient way?

If you want to slurp an entire file into a single string:

  $string = join '', FILE;

FILE will dump its contents into an anonymous list, which will then
be joined together seamlessly.

-- 
Jonathan Dataweaver Lang

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




How to reinvent grep in perl?

2007-10-03 Thread siegfried
I need to search large amounts of source code and grep is not doing the job.
The problem is that I keep matching stuff in the comments of the
C++/Java/Perl/Groovy/Javascript source code.

Can someone give me some hints on where I might start on rewriting grep in
perl so that it ignores the contents of /* and */ comments?

Thanks,
Siegfried


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-03 Thread jm
by strange coincidence, i just picked up where i had left off reading
Minimal Perl by Tim Maher, which goes into some detail about
replacing grep (among other functionalities) with either perl
one-liners or scripts, depending on preference/need.  this book may
well be worth your time and money.



On 10/3/07, siegfried [EMAIL PROTECTED] wrote:
 I need to search large amounts of source code and grep is not doing the job.
 The problem is that I keep matching stuff in the comments of the
 C++/Java/Perl/Groovy/Javascript source code.

 Can someone give me some hints on where I might start on rewriting grep in
 perl so that it ignores the contents of /* and */ comments?

 Thanks,
 Siegfried


 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/





-- 
since this is a gmail account, please verify the mailing list is
included in the reply to addresses

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-03 Thread Chas. Owens
On 10/3/07, siegfried [EMAIL PROTECTED] wrote:
 I need to search large amounts of source code and grep is not doing the job.
 The problem is that I keep matching stuff in the comments of the
 C++/Java/Perl/Groovy/Javascript source code.

 Can someone give me some hints on where I might start on rewriting grep in
 perl so that it ignores the contents of /* and */ comments?
snip

perldoc -q comments

will get you halfway there

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-03 Thread Chas. Owens
On 10/3/07, Jonathan Lang [EMAIL PROTECTED] wrote:
snip
 Chas shows one possibility.  However, that approach generally involves
 slurping the entire file into the perl script, applying the regex to
 the whole thing, and then spitting the result out again.  From what I
 understand, this generally isn't very good form.
snip

The reason slurping files is frowned upon is that you do not know how
large the file may be.  You may be testing with small files, but
production may be using multi-gig files.  Source files are almost
always less than a few megabytes (and if they aren't you have a bigger
problem than a slurp).

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep in perl?

2007-10-03 Thread Jonathan Lang
siegfried wrote:
 I need to search large amounts of source code and grep is not doing the job.
 The problem is that I keep matching stuff in the comments of the
 C++/Java/Perl/Groovy/Javascript source code.

 Can someone give me some hints on where I might start on rewriting grep in
 perl so that it ignores the contents of /* and */ comments?

Instead of rewriting grep, consider writing a comment filter.  Have it
read from standard input and write to standard output; pipe the file
that you want to grep into it, and pipe its output into grep.

As for the file itself: there's probably an elegant way to use regexes
to trim out the comments; the 'perldoc -q comments' suggestion made by
Chas shows one possibility.  However, that approach generally involves
slurping the entire file into the perl script, applying the regex to
the whole thing, and then spitting the result out again.  From what I
understand, this generally isn't very good form.

A messier approach that has the benefit of being less memory-intensive
and of producing output with less of a delay would be to write a
contextual switchboard: read the input stream a little at a time;
decide if a given block of character is code, quote, or comment; and
send it on to the output stream if it is code or quote.  (The reason
for distinguishing between code and quote is that '/*' and '*/' don't
denote comments when they appear within quotes.)

The key to this approach is the context.  Start the program in 'code'
context, and start reading the input stream a line at a time.  For a
given line, search for the first instance of characters that would
denote the beginning of a comment or quote.  If you don't find
anything, send the whole line to the output stream; if you find
something, send everything before it to the output stream, switch to
quote or comment context as appropriate, and examine the rest of the
line under the new context.

Under quote context, do exactly the same as in code context, except
that what you're looking for is the earliest character combination
that will end the quote.  In particular, you are _not_ looking for the
start of a comment.  When you find the end of the quote, you switch
back to code context for the rest of the line.

Comment context works almost exactly like quote context, except that
you're looking exclusively for the end of the comment, and you throw
away everything prior to it instead of sending it to the output
stream.  For the purpose of maintaining line counts, send a newline
character to the output stream every time you hit the end of a line
while in comment context.  (Alternatively, you might consider writing
the script to dump the comment contents to stderr, which could be
useful if you ever want to grep the contents of the comments instead
of the code.  If you do this be sure to send a newline to stderr
whenever you end a line in code or quote context.)

-- 
Jonathan Dataweaver Lang

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: How to reinvent grep with perl? (OT: Cygwin grep)

2004-10-10 Thread Harry Putnam
Bakken, Luke [EMAIL PROTECTED] writes:

 Voila. That's most likely your problem - a mismatch between line endings
 and Cygwin mount point type.

And in case you hadn't seen them before... there are at least a few
sets of unix tools for dos/windows.  Cygwin maybe the best known but
I've used Uwin myself for sometime  and never had a problem with its
grep.

http://www.research.att.com/sw/tools/uwin/


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




RE: How to reinvent grep with perl? (OT: Cygwin grep)

2004-10-10 Thread Bakken, Luke
  Voila. That's most likely your problem - a mismatch between 
 line endings
  and Cygwin mount point type.
 
 And in case you hadn't seen them before... there are at least a few
 sets of unix tools for dos/windows.  Cygwin maybe the best known but
 I've used Uwin myself for sometime  and never had a problem with its
 grep.
 
 http://www.research.att.com/sw/tools/uwin/

And another suggestion:

http://gnuwin32.sourceforge.net

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




How to reinvent grep with perl?

2004-10-09 Thread Siegfried Heintze
My man pages and info pages are not working well and I cannot figure out how
to make grep search for a certain pattern. I even tried egrep and fgrep. So
how do I reinvent grep with perl? Here is my attempt:

 

perl -n -e 'print $. $_ if /^ *END *$/' *.f

 

This works better than grep, except for the fact it does not print the file
name. How can I make perl print the file file name?

 

How do I do this with (f\|e\|)grep? - woops - that is off topic. Never mind
then. Just, how do I print the file name using perl?

 

Thanks,

Siegfried



Re: How to reinvent grep with perl?

2004-10-09 Thread Andrew Gaffney
Siegfried Heintze wrote:
My man pages and info pages are not working well and I cannot figure out how
to make grep search for a certain pattern. I even tried egrep and fgrep. So
how do I reinvent grep with perl? Here is my attempt:
There's no need. When you do 'man whatever', you can hit '/', type a search 
term, and hit enter. To search that same term again, '/' then enter will do. 
This works on my Gentoo Linux box.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: How to reinvent grep with perl?

2004-10-09 Thread Andrew Gaffney
Siegfried Heintze wrote:
Andrew,
  Thanks. When I hit n to go to the next page, it says No previous
regular expression (press RETURN). So I can only display the first page. I
have it expanded to the full screen but I still cannot see the portion of
the display that tells me how to use extended regular expressions.
Apparently the basic regular expressions don't include ^ and $.
If your 'man' uses 'less' like mine does, you hit Space to go to the next page 
and use the arrow keys to scroll one line at a time. I haven't used 'info' in a 
while, but I believe you can search those with 's'.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



Re: How to reinvent grep with perl?

2004-10-09 Thread Harry Putnam
Siegfried Heintze [EMAIL PROTECTED] writes:

 My man pages and info pages are not working well and I cannot figure out how
 to make grep search for a certain pattern. I even tried egrep and fgrep. So
 how do I reinvent grep with perl? Here is my attempt:

  

 perl -n -e 'print $. $_ if /^ *END *$/' *.f

grep -n '^ *END *$' *.f


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to reinvent grep with perl?

2004-10-09 Thread Harry Putnam
Siegfried Heintze [EMAIL PROTECTED] writes:

 This works better than grep, except for the fact it does not print the file
 name. How can I make perl print the file file name?

How is it better than grep?


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




RE: How to reinvent grep with perl?

2004-10-09 Thread Siegfried Heintze
Perl works better than grep because the grep statement you give below does
not find any instances of the pattern and perl finds quite a few. I'm using
Cygwin on Win2003Server. Since there is something obviously wrong with my
cygwin implementation of grep, how do I get the file names with perl?

This, works, but it sure is ugly. Is there not an easier way to do this with
perl?

perl -e'@ARGV = (-) unless @ARGV; while(@ARGV){ $ARGV= shift @ARGV;
if(!open(ARGV, $ARGV)){ warn Cannot open $ARGV: $!\n; next;} while
(ARGV){ print $ARGV:$.:$_\n if/^ *END *$/; }}' *.f

Thanks,
Siegfried

-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Harry Putnam
Sent: Saturday, October 09, 2004 4:33 PM
To: [EMAIL PROTECTED]
Subject: Re: How to reinvent grep with perl?

Siegfried Heintze [EMAIL PROTECTED] writes:

 My man pages and info pages are not working well and I cannot figure out
how
 to make grep search for a certain pattern. I even tried egrep and fgrep.
So
 how do I reinvent grep with perl? Here is my attempt:

  

 perl -n -e 'print $. $_ if /^ *END *$/' *.f

grep -n '^ *END *$' *.f


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: How to reinvent grep with perl?

2004-10-09 Thread Harry Putnam
Siegfried Heintze [EMAIL PROTECTED] writes:

 This, works, but it sure is ugly. Is there not an easier way to do this with
 perl?

 perl -e'@ARGV = (-) unless @ARGV; while(@ARGV){ $ARGV= shift @ARGV;
 if(!open(ARGV, $ARGV)){ warn Cannot open $ARGV: $!\n; next;} while
 (ARGV){ print $ARGV:$.:$_\n if/^ *END *$/; }}' *.f

Doesn't something simple like:

 perl -n -e 'print $ARGV\n$. $_ if /PATTERN/' *.f
work for you.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response