Both Brian and Bill -- thanks immensely. I'm learning a lot in the process 
-- and have over the last year just reading yours and others postings. I 
understood most of the comments and really appreciate the advice. The 
problem still persists -- I think I know from where it is coming -- but not 
how to fix it.

First, a more narrow question. I am not sure I completely follow your 
comment on local and global variables. If I declare a variable inside a 
look (e.g. my $newvariable), it will not be available outside the loop 
(which is good if I don't need it so it won't consume memory). Looking at 
your edits of my program -- this makes sense.

Ok, now for my persistent problem. As the program runs, I can see it use 
more and more memory -- until it crashes. I think (and could be wrong) is 
that the program is not deleting the tree when it is done. I will enclose 
the program below, but let me explain what I have done. The program will 
eventually read different input files -- but for testing it uses the same 
input file over and over. At the moment (see below) the
        my $root = HTML::TreeBuilder->new;
        $root->parse($doc);
        $root->eof();
are in the loop. I have tried to include
        $root->delete();
at the end of the loop, but with no effect.

If I move the commands
        my $root = HTML::TreeBuilder->new;
        $root->parse($doc);
        $root->eof();
outside the loop -- I don't have a memory problem. Thus I think the program 
is not releasing the memory of the old tree, when it builds the new one. I 
can't have the $root->parse($doc) command outside the loop, as when I 
actually use the program -- it will read different files and build the tree 
for each one.

P.S.
        I couldn't figure out the commands
                my @vals =  map {s/[,$ =]//g} @col_asset[0,-1];
                print join(",", @vals), "\n";
        If you could direct me to a manual, that would be fine as well.


Program ----------

use strict;
use warnings;
use HTML::TreeBuilder;

my $txtfile = 'D:/res/edgar/10k/2178_0000002178-06-000013.txt';
my $csvfile = 'D:/res/edgar/match/test2.csv';


# open the CSV file for writing

open OUT, ">$csvfile" or die "create csv: $!($^E)";             
select ((select (OUT), $| = 1)[0]);                                     # 
unbuffer CSV write

# open the text file for reading
open IN, $txtfile or die "open $txtfile: $!($^E)";
my $doc = join '', <IN>;                                                # read 
file in to $doc variable;
close IN;

my $total = 0;
while ($total <= 3000) {

        my $asset_s=0;
        my $asset_s2=0;

        my $root = HTML::TreeBuilder->new;
        $root->parse($doc);
        $root->eof();

        OUTER_LOOP:
        foreach my $table ($root->find_by_tag_name('TABLE')) {                  
# put tables into 
array then put each one in $table;
                my $txt = $table->as_text_trimmed;
                next if ($txt !~ /total asset/is || $txt !~ /(\d|,){4,12}/is);  
# skip 
items not of interest
                my @col_asset;                                                  
                        # my @col_asset = ();
                foreach my $row ($table->find_by_tag_name('tr')) {
                        next if $row->as_text_trimmed !~ /^total asset/i;       
        # skip rows not of 
interest
                        foreach my $column ($row->find_by_tag_name('td')) {
                                my $col_text = $column->as_text_trimmed;
                                if ($col_text =~ /[\d,\.]{4,12}/) {
                                        push @col_asset, $col_text if $col_text 
=~ /([\d,\.]{4,12})/;   
                                        }
                                }
                        $asset_s = $col_asset[0];
                        $asset_s2 = $col_asset[-1];
                        last;
                        }

                $asset_s =~ s/[,$ =]//g;                                # drop 
',', '$', ' ', & '='
                $asset_s2 =~ s/[,$ =]//g;
                last OUTER_LOOP;                                                
        # only do 1st table
                }

$total++;
        print OUT "$asset_s $asset_s2 $total \n";


        }
close OUT;

__END__


_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to