From: <[EMAIL PROTECTED]> > Thanks for your response. Now I can concentrate on how to hack the > code. What what is your take on how to represent the table entries > (cells)? What is the most efficient way to associate each cell with > its parent header?
It's a bit hard to give good suggestions if I do not understand the data. Are all the data yearly like the stuff in page6.prn? If so it would IMHO be best to have one table with the headers and another with the data. So you'd have something like HEADERS ID| Name 1 | Domestic nonfinancial sectors: Total 2 | Domestic nonfinancial sectors: Federal government 3 | Domestic nonfinancial sectors: Nonfederal: Total nonfederal ... and DATA HeaderID | Year | Value 1 | 1969 | 7.2 2 | 1969 | -1.1 ... Parsing the headers from the document will be tricky. Since the columns are not all the same width (and I believe the Nth column in one report will be on a different place than in another) you'll have to start by looking at the last line and finding the places to split the lowest level headers. Actually .. I felt like doing some Perl again today ... you can find the code attached. It does the tricky job of extracting the complete headers, extracting the values and inserting into the database should be simple. Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery
use strict; open IN, '< c:\temp\page6.prn'; my $report_text = do {local $/; <IN>}; close IN; $report_text =~ s/\n\n+/\n/g; # remove empty lines $report_text =~ s/^.*^--------+\n//sm; # remove the header #print "=====================================\n$report_text\n=========================================\n"; my @report = split /\n/, $report_text; # split to lines #print "lines: "[EMAIL PROTECTED]"\n"; my @last_line = ($report[-1] =~ /(.*?-?\d+(?:\.\d+)?)/g); # split the last line into fields (including spaces!) #print join("\n", @last_line),"\n"; my @lengths = map {length($_)} @last_line; my @end_pos = do { my $sum = 0; # this variable is local to the map, I keep the sum of the lengths in it map {$sum += $_} @lengths }; #print join(", ", @lengths),"\n"; #print join(", ", @end_pos),"\n"; my (@section_lines, @header_lines); while ($report[0] =~ /^\s+--/) { # move the rows starting with spaces and -- to @section_lines array push @section_lines, shift(@report); } print "----------------------------\n", join("\n", @section_lines), "\n----------------------------\n"; while ($report[0] =~ /^\s+\w/) { # move the rows starting with spaces followed by text to @header_lines array push @header_lines, shift(@report); } #print "----------------------------\n", join("\n", @header_lines), "\n----------------------------\n"; shift(@report); # remove the ________________________ my $unpack_format = 'A' . join( 'A', @lengths); my @headers = unpack( $unpack_format, shift(@header_lines)); # split the first line of column headers foreach my $header_line (@header_lines) { my @next = unpack( $unpack_format, $header_line); for(my $i=0; $i <= $#headers; $i++) { $headers[$i] .= $next[$i]; } } foreach (@headers) { s/^\s+//; s/\s+$//; s/\s+/ /g; } #print "----------------------------\n", join("\n", @headers), "\n----------------------------\n"; foreach my $section_line (reverse(@section_lines)) { print "\$section_line=$section_line\n"; for(my $i=0; $i <= $#headers; $i++) { my ($begin, $end) = ( substr($section_line, 0, $end_pos[$i]), substr($section_line, $end_pos[$i]) ); if ($begin =~ /-([^-]*\w\s*)$/ and (my $tmp = $1) and $end =~ /^(\s*\w[^-]*)-/) { $headers[$i] = $tmp . $1 . ': ' . $headers[$i]; } elsif ($begin =~ /-(\w[^-]*)-+$/) { $headers[$i] = $1 . ': ' . $headers[$i]; } elsif ($end =~ /^-*(\w[^-]*)-/) { $headers[$i] = $1 . ': ' . $headers[$i]; } } } print "----------------------------\n", join("\n", @headers), "\n----------------------------\n";
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>