On Tue, Nov 26, 2002 at 10:57:31AM -0800, Jacob Schroeder wrote:
> here's what's inside the loop (the spacing is a little off...):
Yeah, what you have is a lot more complicated than it needs to be.
By going line-by-line you have a very narrow view of the file and you have
to keep all this extra state information around to remember where you are in
the file and have all that complicated, nested logic. If you go record by
record, much of the state problem goes away.
So instead of reading in a single line, you read in all the logs for a
single file in one shot. ie. everything up to the next ========== line. $/
controls what perl things of as being the end of a "line". Normally its a
newline (\n), but that's just convention and you can change it.
# One "line" == One file's logs
local $/ =
'=============================================================================';
while(<CVSLOG>) {
Then you split that into file headers and individual revisions by splitting
on --------.
my($head, @revisions) = split /^----------------------------$/m, $_;
Now you've got the header lines (ie. everything from "RCS File:" to
"description:") seperated from each individual revision lines (from
"revision 1.11" to the log) for a single file. Then its easy to take them
each apart with individual subroutines.
my($file_headers) = parse_head($head);
foreach my $revision (@revisions) {
my($rev, $info, $log) = parse_revision($revision);
...
}
}
and that's it. The rest is just deciding how you want the resulting data to
be structured. parse_head() and parse_revision() are fairly simple because
they focus on a single part of the log file.
> Our CVS repository root directory is about 550 MB with 14,322 files in 1,495
> folders. It is stored on a shared network drive and I access it through the
> network. I use the rlog because log requires a checked out copy of the
> files to get a log from. I like the example you provided, although it will
> take me a little while to work through it and see what's going on, perl
> still is a little confusing to me.
There's nothing going on in the program that's really perl specific, just
the syntax is different. The end-of-line change (ie. setting $/) is really
the only odd bit, but that's possible in other languages, just a lot more
difficult.
> I agree that 30 minutes does sound like a long time because if i run the
> follwing command:
> cvs -n -d \\server\cvs\srcroot rlog . > rlog_output.txt
> it only takes maybe 15 minutes to run. Once it's done running, the
> rlog_output.txt file is about 32.5 MB, so it's kind of big, but I thought
> knowing that might help us analyze what's going on here. the computer
> should be fast enough, it's a Pentium III 700 Mhz with 264 MB RAM with 10GB
> Harddrive.
Hmm, the biggest rlog I can muster is 1.1 megs, but it only takes 5 seconds
to spit out. This is from a local CVS repository. The fact that you appear
to be pulling from a remote repository is likely the big slow down. Try
running the program directly on the CVS server instead if you can. Still,
even if it takes 15 minutes to pull down it means your perl program is still
taking an additional 15 minutes to parse. As mentioned above, its likely
because your code is just too complicated.
The little parser I posted earlier takes 11 seconds to handle 1000 files and
4400 revisions and the perl process eats about 6.5 megs (about 3 of which is
actual data, the rest is perl itself). Half of that time is spent just on
running "cvs rlog". This is on a G3 Powerbook 266 with 310 MB RAM, 620 MB
VRAM and a 40gig EIDE drive, so roughly what you have. Extrapolated out
that would take about 100 megs to store the whole parsed log using your
rlog. You should be ok on your machine as long as you have some virtual
memory to pick up the slack.
--
Michael G. Schwern <[EMAIL PROTECTED]> http://www.pobox.com/~schwern/
Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One
Any sufficiently encapsulated hack is no longer a hack.