Peter,

I have tried a few different "perl: ways to grep for patterns in files
(~1500 files totaling ~350MB) cleanly and efficiently. each timed I thought
of something / tried something it was ridiculously slow.

I have tried a foreach on an array of files and parsing the $_ etc.. that
took nearly ~5 mins! which
is way way to slow for this task. the three (or four) system calls to egrep
works ok, seeing the load
is split upon 3 - 4 cpus (unix here), if I could write a perl function that
was total internal to perl with no system calls that was roughly 45 - 60
seconds in parsing time that would be great, but me being me I doubt it will
happen any time soon


> -----Original Message-----
> From: Peter Scott [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 07, 2001 12:20
> To: Yacketta, Ronald
> Cc: [EMAIL PROTECTED]
> Subject: RE: ideas to clean this up?
> 
> 
> [My rule is that beginners' questions arising from a thread on the 
> beginners' list get answered on the list, FYI.  I may make 
> mistakes that 
> others will catch.]
> 
> At 09:37 AM 8/7/01 -0400, Yacketta, Ronald wrote:
> >Peter,
> >
> >Does this look correct?
> >
> >exec egrep, $lookFor, @{$logFiles{$_}} unless $pid=fork; # 
> fork new process
> >for cmd
> 
> Your egrep ought to be quoted, unless you're running without 
> strictness 
> enabled, in which case you have more problems...
> 
> That will in fact background an egrep command like you want.  
> It's output 
> will go to wherever your program's output goes.  It won't be 
> captured by 
> your program.  If your program is going to continue for any 
> great length of 
> time or is going to spawn a lot of processes, make sure to 
> waitpid() for 
> your children.
> 
> Since you're using egrep rather than grep, I assume you want to take 
> advantage of it's extended regex syntax, therefore your $lookfor may 
> contain regex metacharacters.  Make sure you've escaped what 
> needs to be 
> escaped, etc.
> 
> >I would use a perl regex, but it takes way to much cpu/time.
> 
> Perl's way of searching files with regexen will be just as 
> fast as egrep's, 
> possibly faster.
> 
> >I need to
> >beable to spawn 3 - 4 utils to parse for the $lookFor in 
> each file within
> >the logFiles array.
> 
> As long as you just want the results of an egrep search to go to your 
> programs stdout and stderr, fine.  The day you want your 
> program to get at 
> those results, things will get more complicated and you'll 
> probably end up 
> doing the regex searching in Perl, still in forked children.
> 
> >-Ron
> >
> > > -----Original Message-----
> > > From: Peter Scott [mailto:[EMAIL PROTECTED]]
> > > Sent: Monday, August 06, 2001 15:20
> > > To: Yacketta, Ronald; Beginners (E-mail)
> > > Subject: RE: ideas to clean this up?
> > >
> > >
> > > At 02:51 PM 8/6/01 -0400, Yacketta, Ronald wrote:
> > > >Thanxs!
> > > >
> > > >now off to modify my exec code that parses an entire array
> > > of files :)
> > >
> > > Of course, the arrayrefs could equally well have been stored
> > > in an array
> > > instead of a hash.  There's a thin justification for a hash
> > > in the absence
> > > of any other context, but the actual context could easily 
> change that.
> > >
> > > And your exec code ought not to have to change.  If you're
> > > used to doing
> > > something that says
> > >
> > >          foo (@files)
> > >
> > > then just do instead
> > >
> > >          foo (@{$logFiles{$key})
> > >
> > > where $key is one of the hash keys - obviously now you can
> > > loop through all
> > > of them.
> > >
> > > > > This may seem a little obvious, but...
> > > > >
> > > > > my %logFiles;
> > > > > for my $key (1 .. 6) {
> > > > >    opendir DIR, "../logs/set$key" or die "opendir
> > > > > ../logs/set$key: $!\n";
> > > > >    push @{$logFiles{$key}}, map "../logs/set$key/$_",
> > > > >                             grep !/^\.\.?$/, sort readdir DIR;
> > > > >    closedir DIR;
> > > > > }
> > > > >
> > > > > Now the filenames are in arrays which are referenced from the
> > > > > values of the
> > > > > hash %logFiles (keys are 1 through 6, but maybe you want
> > > to use the
> > > > > directory name instead).  I took the liberty of removing
> > > the usually
> > > > > useless directory entries and sorting, since you'll probably
> > > > > want them
> > > > > sorted later.
> > >
> > > --
> > > Peter Scott
> > > Pacific Systems Design Technologies
> > > http://www.perldebugged.com
> > >
> 
> --
> Peter Scott
> Pacific Systems Design Technologies
> http://www.perldebugged.com
> 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to