Peter,
I have tried a few different "perl: ways to grep for patterns in files
(~1500 files totaling ~350MB) cleanly and efficiently. each timed I thought
of something / tried something it was ridiculously slow.
I have tried a foreach on an array of files and parsing the $_ etc.. that
took nearly ~5 mins! which
is way way to slow for this task. the three (or four) system calls to egrep
works ok, seeing the load
is split upon 3 - 4 cpus (unix here), if I could write a perl function that
was total internal to perl with no system calls that was roughly 45 - 60
seconds in parsing time that would be great, but me being me I doubt it will
happen any time soon
> -----Original Message-----
> From: Peter Scott [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 07, 2001 12:20
> To: Yacketta, Ronald
> Cc: [EMAIL PROTECTED]
> Subject: RE: ideas to clean this up?
>
>
> [My rule is that beginners' questions arising from a thread on the
> beginners' list get answered on the list, FYI. I may make
> mistakes that
> others will catch.]
>
> At 09:37 AM 8/7/01 -0400, Yacketta, Ronald wrote:
> >Peter,
> >
> >Does this look correct?
> >
> >exec egrep, $lookFor, @{$logFiles{$_}} unless $pid=fork; #
> fork new process
> >for cmd
>
> Your egrep ought to be quoted, unless you're running without
> strictness
> enabled, in which case you have more problems...
>
> That will in fact background an egrep command like you want.
> It's output
> will go to wherever your program's output goes. It won't be
> captured by
> your program. If your program is going to continue for any
> great length of
> time or is going to spawn a lot of processes, make sure to
> waitpid() for
> your children.
>
> Since you're using egrep rather than grep, I assume you want to take
> advantage of it's extended regex syntax, therefore your $lookfor may
> contain regex metacharacters. Make sure you've escaped what
> needs to be
> escaped, etc.
>
> >I would use a perl regex, but it takes way to much cpu/time.
>
> Perl's way of searching files with regexen will be just as
> fast as egrep's,
> possibly faster.
>
> >I need to
> >beable to spawn 3 - 4 utils to parse for the $lookFor in
> each file within
> >the logFiles array.
>
> As long as you just want the results of an egrep search to go to your
> programs stdout and stderr, fine. The day you want your
> program to get at
> those results, things will get more complicated and you'll
> probably end up
> doing the regex searching in Perl, still in forked children.
>
> >-Ron
> >
> > > -----Original Message-----
> > > From: Peter Scott [mailto:[EMAIL PROTECTED]]
> > > Sent: Monday, August 06, 2001 15:20
> > > To: Yacketta, Ronald; Beginners (E-mail)
> > > Subject: RE: ideas to clean this up?
> > >
> > >
> > > At 02:51 PM 8/6/01 -0400, Yacketta, Ronald wrote:
> > > >Thanxs!
> > > >
> > > >now off to modify my exec code that parses an entire array
> > > of files :)
> > >
> > > Of course, the arrayrefs could equally well have been stored
> > > in an array
> > > instead of a hash. There's a thin justification for a hash
> > > in the absence
> > > of any other context, but the actual context could easily
> change that.
> > >
> > > And your exec code ought not to have to change. If you're
> > > used to doing
> > > something that says
> > >
> > > foo (@files)
> > >
> > > then just do instead
> > >
> > > foo (@{$logFiles{$key})
> > >
> > > where $key is one of the hash keys - obviously now you can
> > > loop through all
> > > of them.
> > >
> > > > > This may seem a little obvious, but...
> > > > >
> > > > > my %logFiles;
> > > > > for my $key (1 .. 6) {
> > > > > opendir DIR, "../logs/set$key" or die "opendir
> > > > > ../logs/set$key: $!\n";
> > > > > push @{$logFiles{$key}}, map "../logs/set$key/$_",
> > > > > grep !/^\.\.?$/, sort readdir DIR;
> > > > > closedir DIR;
> > > > > }
> > > > >
> > > > > Now the filenames are in arrays which are referenced from the
> > > > > values of the
> > > > > hash %logFiles (keys are 1 through 6, but maybe you want
> > > to use the
> > > > > directory name instead). I took the liberty of removing
> > > the usually
> > > > > useless directory entries and sorting, since you'll probably
> > > > > want them
> > > > > sorted later.
> > >
> > > --
> > > Peter Scott
> > > Pacific Systems Design Technologies
> > > http://www.perldebugged.com
> > >
>
> --
> Peter Scott
> Pacific Systems Design Technologies
> http://www.perldebugged.com
>
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]