I've run into a strange performance problem that has me stumped. When processing large files, in this case 30MB, the performance of my script drags to a crawl. I've boiled the original problem down to a much smaller size and included the code at the end of the message. Here is the main loop. The loop reads in a line, modifies and stores the line an a newly created object and pushes the object onto an array.

        # Read in the file and store some data in some objects.
        while (<FILE>) {

                if ($ARGV[0] eq 'weird') {
                        # No performance problem when deleting only two characters.
                        # This is very weird!
                        s/..//;
                } else {
                        # Delete three characters. Is that so bad?
                        s/....//;
                }

                # Create an object.
                $object = new Object();

                if ($ARGV[0] eq 'odd') {
                        # No performance problem when dummy object are created.
                        # This is very odd!
                        $dummy = new Dummy();
                        push @dummy, $dummy;
                }

                # Assign the modified line to a data member.
                $object->{V1} = $_;

                # Store the object away.
                push @objects, $object;

                # Every 20000 lines, print out the line number and the time
                # elapsed since the previous line.
                if ($. % 20000 == 0) {
                        $now = time();
                        print "$.\t${\($now - $last)}\t${average}\n";
                        $last = $now;
                }
        }

Every 20000 lines, the script prints out the line number and the time elapsed from the previous print statement. Here is the output I get. Notice that processing slows down dramatically after 140000 lines (it gets much worse with larger files).

        C:\tmp>perl slow.pl

        Reading input file.

        Line#   Sec
        =====   ===
        20000   2
        40000   3
        60000   3
        80000   3
        100000  2
        120000  2
        140000  15
        160000  42
        180000  88

Just for fun, I made some random changes and found the performance problem to mysteriously vanish. This can be seen by running with an argument of 'weird' or 'odd'. If you look at what 'weird' and 'odd' do in the code, it doesn't seem like it should matter. Here is the weird and odd output.

        C:\tmp>perl slow.pl weird

        Reading input file.

        Line#   Sec
        =====   ===
        20000   2
        40000   3
        60000   2
        80000   2
        100000  2
        120000  1
        140000  1
        160000  1
        180000  2

        C:\tmp>perl slow.pl odd

        Reading input file.

        Line#   Sec
        =====   ===
        20000   2
        40000   3
        60000   3
        80000   3
        100000  2
        120000  2
        140000  2
        160000  1
        180000  3

So, can anyone offer an explanation for this behaviour---or better yet, propose a solution? Thanks in advance!

BTW, I'm running ActivePerl v5.6.1 (build 626) on Windows 2000, 384MB RAM.

The input file can be found here in slow.zip. If you have problems retrieving file, I can email it.

        http://briefcase.yahoo.com/bc/wtmoose/lst?.dir=/My+Folder+(shared)&.view=l

The source code is below.

-Tim


###########################################
# Just some object with several data members.
package Object;
sub new {
        my $class = shift;
        my $self = {
                'V1' => undef,  'V2' => undef,
                'V3' => undef,  'V4' => undef,
                'V5' => undef,  'V6' => undef,
                'V7' => undef,  'V8' => undef,
                'V9' => undef,  'V10' => undef,
                'V11' => undef, 'V12' => undef,
        };
        return bless $self, $class;
}

# Dummy object that makes it work, oddly enough.
package Dummy;
sub new {
        my $class = shift;
        my $self = {};
        $self->{DUMMY} = undef,
        return bless $self, $class;
}

package Main;

print "\nReading input file.\n\n";
print "Line#   Sec\n";
print "=====   ===\n";
open FILE, "<slow.txt";
$last = time();

# Read in the file and store some data in some objects.
while (<FILE>) {

        if ($ARGV[0] eq 'weird') {
                # No performance problem when deleting only two characters.
                # This is very weird!
                s/..//;
        } else {
                # Delete three characters. Is that so bad?
                s/....//;
        }

        # Create an object.
        $object = new Object();

        if ($ARGV[0] eq 'odd') {
                # No performance problem when dummy object are created.
                # This is very odd!
                $dummy = new Dummy();
                push @dummy, $dummy;
        }

        # Assign the modified line to a data member.
        $object->{V1} = $_;

        # Store the object away.
        push @objects, $object;

        # Every 20000 lines, print out the line number and the time
        # elapsed since the previous line.
        if ($. % 20000 == 0) {
                $now = time();
                print "$.\t${\($now - $last)}\t${average}\n";
                $last = $now;
        }
}
###########################################

Reply via email to