RE: Out of memory perl script

Jacob Schroeder Tue, 26 Nov 2002 11:03:02 -0800

> 
> I can't see what you're doing because you stubbed out the all 
> important
> part: the bit inside the loop.
> 
>


here's what's inside the loop (the spacing is a little off...): 


    # Build up the command string appropriately, depending on what options
    # have been set.
    my $command =
        ($rlog_module ne "") ? "cvs -n -d $cvsdir rlog $rlog_module" : "cvs
log";
    print "Executing \"$command\"\n" if $debug;

    open (CVSLOG, "$command |") || die "Couldn't execute \"$command\"";
    while (<CVSLOG>)
    {
        if ($search_file == 1)
        {
            # Need to locate the name of the working file
            if (/^RCS file: (.*),v$/)
            {
                $working_file = $1;
                $working_file =~ s/Attic\///g;
                $relative_working_file = "";

        $directory = $working_file;
        $directory =~ s/\//\\/g;
        $directory =~ /.*\\srcroot(.*)\\.*$/;
        $directory = $1;
        
        print "Directory: \"$directory\"\n" if $debug;

                # Check if this file is to be included or not.
                if (include_file($working_file))
                {
                    # Yep, search for more details on this file.
                    $search_file = 0;

                    if ($branch_tag eq "")
                    {
                        # Main branch to be investigated only.
                        $file_branch_number{$working_file} = "1";
                        $file_number_branch_revisions{$working_file} = 0;
                    }
                    print "Including file \"$working_file\"\n" if $debug;
                }
                else
                {
                    print "Excluding file \"$working_file\"\n" if $debug;
                }
            }
        }
        else
        {
            # Collective the relative part for those runs that don't use
            # -rlog.
            if (/^Working file: (.*)$/)
            {
                $relative_working_file = $1;
            }
            # If we are collecting statistics on a branch, determine the
magic
            # branch number for this file.
            elsif ( (! defined $file_branch_number{$working_file}) &&
                 (/^\s*${branch_tag}: ([\d\.]+)\.0\.(\d+)$/) )
            {
                $file_branch_number{$working_file} = "${1}.${2}";
                $file_number_branch_revisions{$working_file} = 0;
                if ($debug)
                {
                    print "Got branch $file_branch_number{$working_file}";
                    print " for file \"$working_file\"\n";
                }
            }
            elsif (/^keyword substitution: b$/)
            {
                # This is a binary file, ignore it.
                undef($file_branch_number{$working_file});
                undef($file_number_branch_revisions{$working_file});
                $search_file = 1;
                print "Excluding binary file \"$working_file\"\n" if $debug;
            }
            elsif
(/^=========================================================================
====$/)
            {
                # End of the log entry for this file, start parsing for the
                # next file.
                $search_file = 1;
                next;
            }
            elsif (/^----------------------------$/)
            {
                # Matched the description separator.  If a branch has been
                # specified, but this file doesn't exist on it, skip this
file.
                if (($branch_tag ne "") &&
                    (! defined $file_branch_number{$working_file}))
                {
                    if ($debug)
                    {
                        print "File \"$working_file\" not on branch\n";
                    }
                    $search_file = 1;
                    next;
                }

                # Read the revision line, and record the appropriate
                # information.
                $_ = <CVSLOG>;

                if (/^revision ([\d\.]+)$/)
                {
                    # Record the revision, and whether it is part of the tag
                    # of interest.
                    $revision = $1;
                    if ($revision =~
                        /^$file_branch_number{$working_file}\.\d+$/)
                    {
                        $file_on_branch = 1;
                        $file_number_branch_revisions{$working_file}++;
                    }
                    else
                    {
                        $file_on_branch = 0;
                    }
                    if ($debug)
                    {
                        print "Got branch number:
$file_branch_number{$working_file} rev $revision on branch:
$file_on_branch\n";
                    }
                }
                else
                {
                    # Problem in parsing, skip it.
                    print "Couldn't parse line: $_\n";
                    $search_file = 1;
                    next;
                }
                    
                $_ = <CVSLOG>;          # Read the "date" line.
                if (/^date: (\d\d\d\d\/\d\d\/\d\d \d\d:\d\d:\d\d);.*author:
(.*);.* state: (.*);.*lines: \+(\d+) \-(\d+)$/)
                {
                    # Note for some CVS clients, state dead is presented in
                    # this this way, as the following pattern.
                    $date = $1;
            $author = $2;
                    $state = $3;
                    $lines_added = $4;
                    $lines_removed = $5;
                    $number_lines = $lines_added - $lines_removed;
            $lines_modified = $lines_added + $lines_removed;

                    $file_version_delta{$working_file}{$revision} =
                        $number_lines;
                    $file_version_state{$working_file}{$revision} = $state;

                    if ($file_on_branch)
                    {
                        # This revision lives on the branch of interest.
                        $line_stats{$date}{$working_file} += $number_lines;
                        $state_stats{$date}{$working_file} = $state;
                        $revision_stats{$date}{$working_file} = $revision;

            #User stats
            $user_stats_master{$author}{$date} = $directory;
            $user_stats{$author}{$date} += $lines_modified;
            $user_stats_summary{$author} += $lines_modified;

                    }
            }
                elsif (/^date: (\d\d\d\d\/\d\d\/\d\d \d\d:\d\d:\d\d); .*
state: dead;$/)
                {
                    # File has been removed.
                    $date = $1;

                    $file_version_delta{$working_file}{$revision} = 0;
                    $file_version_state{$working_file}{$revision} = "dead";
                    
                    if ($file_on_branch)
                    {
                        $line_stats{$date}{$working_file} = 0;
                        $state_stats{$date}{$working_file} = "dead";
                        $revision_stats{$date}{$working_file} = $revision;
                    }
                }
                elsif (/^date: (\d\d\d\d\/\d\d\/\d\d
\d\d:\d\d:\d\d);.*author: (.*);.* state: Exp;$/)
                {
                    $date = $1;
            $author = $2;

                    # Unfortunately, cvs log doesn't indicate the number of
                    # lines an initial revision is created with, so find
this
                    # out using the following cvs command.
                    my $lccmd = "";
                    if ($rlog_module ne "")
                    {
            print "1.working_file = $working_file\n";
            print "2.working_cvsdir = $working_cvsdir\n";
                        $working_file =~ /^${working_cvsdir}\/(.*)$/;
            print "3.working_file = $working_file\n";
            #TODO:FIX THIS LINE
                        $lccmd = "cvs -d $cvsdir co -r $revision -p \"$1\"";
                    }
                    else
                    {
                        $lccmd = "cvs update -r $revision -p
\"$relative_working_file\"";
                    }
                    print "Executing $lccmd\n" if $debug;
                    $number_lines = `$lccmd 2>/dev/null | wc -l`;
                    chop $number_lines;
                    $number_lines =~ s/ //g;
                    print "$working_file 1.1 = $number_lines lines\n" if
$debug;

                    $file_version_delta{$working_file}{$revision} =
                        $number_lines;
                    $file_version_state{$working_file}{$revision} = "Exp";

                    if ($file_on_branch)
                    {
                        $line_stats{$date}{$working_file} += $number_lines;
                        $state_stats{$date}{$working_file} = "Exp";
                        $revision_stats{$date}{$working_file} = $revision;

            #User stats
            $user_stats_master{$author}{$date} = $directory;
            $user_stats{$author}{$date} += $number_lines;
            $user_stats_summary{$author} += $number_lines;

                    }
                }
                else
                {
                    print "Couldn't parse: $_";
                }
                if ($debug)
                {
                    print "File \"$working_file\" rev $revision ";
                    print "delta
$file_version_delta{$working_file}{$revision} ";
                    print "state
$file_version_state{$working_file}{$revision}\n";
                }
            }
        }
    }




If you guys want to look at the whole code, it's at
http://cvsstat.sourceforge.net/ and I added a little code of my own to
retrieve user stats , but the above loop is where I'm having the problem,
I'm pretty sure about that.  Let me know if you have any more questions
relating to the code.  My OS is windows 2000 advanced server and I'm running
the mks toolkit.





> > from that, however, when I run this, I get an "Out Of 
> Memory" error as I'm
> > parsing the text.  Is this because I'm using hashes or 
> because there is just
> > a lot of text for cvs rlog on the root?  I get the "Out Of 
> Memory" error
> > after it runs for like 30 minutes or so and if I watch the 
> process it
> > usually gives me that error once it is using about 20 MB of memory.
> 
> It takes your program 30 minutes to parse a CVS log?  That's a strong
> indication that something is Very Wrong with your parser.  
> How big is this
> log and how slow is this computer?
> 

Our CVS repository root directory is about 550 MB with 14,322 files in 1,495
folders.  It is stored on a shared network drive and I access it through the
network.  I use the rlog because log requires a checked out copy of the
files to get a log from.  I like the example you provided, although it will
take me a little while to work through it and see what's going on, perl
still is a little confusing to me.  

I agree that 30 minutes does sound like a long time because if i run the
follwing command:
cvs -n -d \\server\cvs\srcroot rlog . > rlog_output.txt
it only takes maybe 15 minutes to run.  Once it's done running, the
rlog_output.txt file is about 32.5 MB, so it's kind of big, but I thought
knowing that might help us analyze what's going on here.  the computer
should be fast enough, it's a Pentium III 700 Mhz with 264 MB RAM with 10GB
Harddrive.


Thanks for all the responses I received from everyone!  I'm just posting
this one 'mega' response that hopefully answers the questions brought up so
far.  I really appreciate it, because it's easy to get frustrated when
learning a new language, especially if you don't have any gurus to ask for
advice!  :)  Let me know if you guys have more suggestions or see anything
wrong in my looping code above.




>

RE: Out of memory perl script

Reply via email to