At 02:02 PM 8/7/01 -0400, Yacketta, Ronald wrote:
>Peter,
>
>I have tried a few different "perl: ways to grep for patterns in files
>(~1500 files totaling ~350MB) cleanly and efficiently. each timed I thought
>of something / tried something it was ridiculously slow.
>
>I have tried a foreach on an array of files and parsing the $_ etc.. that
>took nearly ~5 mins! which
>is way way to slow for this task. the three (or four) system calls to egrep
>works ok, seeing the load
>is split upon 3 - 4 cpus (unix here), if I could write a perl function that
>was total internal to perl with no system calls that was roughly 45 - 60
>seconds in parsing time that would be great, but me being me I doubt it will
>happen any time soon

Okay, try this really simple approach.  Also, are the patterns you're 
searching for really patterns or are they just substrings?  Might be faster 
to use index() instead of // in that case.

use POSIX 'WNOHANG';
$| = 1;
# Next three lines were just for testing...
forkgrep('foo', glob'a/*');
forkgrep('bar', glob'b/*');
forkgrep('b.*z', glob'c/*');
1 while waitpid(-1, WNOHANG);

sub forkgrep {
   return if fork;
   (my $regex, @ARGV) = @_;
   while (<>) {
     print "$ARGV: $_" if /$regex/o;
   }
   exit;
}



> > -----Original Message-----
> > From: Peter Scott [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, August 07, 2001 12:20
> > To: Yacketta, Ronald
> > Cc: [EMAIL PROTECTED]
> > Subject: RE: ideas to clean this up?
> >
> >
> > [My rule is that beginners' questions arising from a thread on the
> > beginners' list get answered on the list, FYI.  I may make
> > mistakes that
> > others will catch.]
> >
> > At 09:37 AM 8/7/01 -0400, Yacketta, Ronald wrote:
> > >Peter,
> > >
> > >Does this look correct?
> > >
> > >exec egrep, $lookFor, @{$logFiles{$_}} unless $pid=fork; #
> > fork new process
> > >for cmd
> >
> > Your egrep ought to be quoted, unless you're running without
> > strictness
> > enabled, in which case you have more problems...
> >
> > That will in fact background an egrep command like you want.
> > It's output
> > will go to wherever your program's output goes.  It won't be
> > captured by
> > your program.  If your program is going to continue for any
> > great length of
> > time or is going to spawn a lot of processes, make sure to
> > waitpid() for
> > your children.
> >
> > Since you're using egrep rather than grep, I assume you want to take
> > advantage of it's extended regex syntax, therefore your $lookfor may
> > contain regex metacharacters.  Make sure you've escaped what
> > needs to be
> > escaped, etc.
> >
> > >I would use a perl regex, but it takes way to much cpu/time.
> >
> > Perl's way of searching files with regexen will be just as
> > fast as egrep's,
> > possibly faster.
> >
> > >I need to
> > >beable to spawn 3 - 4 utils to parse for the $lookFor in
> > each file within
> > >the logFiles array.
> >
> > As long as you just want the results of an egrep search to go to your
> > programs stdout and stderr, fine.  The day you want your
> > program to get at
> > those results, things will get more complicated and you'll
> > probably end up
> > doing the regex searching in Perl, still in forked children.
> >
> > >-Ron
> > >
> > > > -----Original Message-----
> > > > From: Peter Scott [mailto:[EMAIL PROTECTED]]
> > > > Sent: Monday, August 06, 2001 15:20
> > > > To: Yacketta, Ronald; Beginners (E-mail)
> > > > Subject: RE: ideas to clean this up?
> > > >
> > > >
> > > > At 02:51 PM 8/6/01 -0400, Yacketta, Ronald wrote:
> > > > >Thanxs!
> > > > >
> > > > >now off to modify my exec code that parses an entire array
> > > > of files :)
> > > >
> > > > Of course, the arrayrefs could equally well have been stored
> > > > in an array
> > > > instead of a hash.  There's a thin justification for a hash
> > > > in the absence
> > > > of any other context, but the actual context could easily
> > change that.
> > > >
> > > > And your exec code ought not to have to change.  If you're
> > > > used to doing
> > > > something that says
> > > >
> > > >          foo (@files)
> > > >
> > > > then just do instead
> > > >
> > > >          foo (@{$logFiles{$key})
> > > >
> > > > where $key is one of the hash keys - obviously now you can
> > > > loop through all
> > > > of them.
> > > >
> > > > > > This may seem a little obvious, but...
> > > > > >
> > > > > > my %logFiles;
> > > > > > for my $key (1 .. 6) {
> > > > > >    opendir DIR, "../logs/set$key" or die "opendir
> > > > > > ../logs/set$key: $!\n";
> > > > > >    push @{$logFiles{$key}}, map "../logs/set$key/$_",
> > > > > >                             grep !/^\.\.?$/, sort readdir DIR;
> > > > > >    closedir DIR;
> > > > > > }
> > > > > >
> > > > > > Now the filenames are in arrays which are referenced from the
> > > > > > values of the
> > > > > > hash %logFiles (keys are 1 through 6, but maybe you want
> > > > to use the
> > > > > > directory name instead).  I took the liberty of removing
> > > > the usually
> > > > > > useless directory entries and sorting, since you'll probably
> > > > > > want them
> > > > > > sorted later.
> > > >
> > > > --
> > > > Peter Scott
> > > > Pacific Systems Design Technologies
> > > > http://www.perldebugged.com
> > > >
> >
> > --
> > Peter Scott
> > Pacific Systems Design Technologies
> > http://www.perldebugged.com
> >

--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to