Re: extracting data from table embedded in Word document

Andrew Gaffney Sat, 20 Mar 2004 21:26:46 -0800

Andrew Gaffney wrote:

I'm writing a web-based client information system for a lawyer. His current client list is in a 137 page Word document with an embedded table. I can get it into a somewhat usable format by copying the entire table, pasting into Excel, and saving as tab delimeted text, but this has its problems.

Some of the cells in the table have newlines in them. Because of this, when it's exported from Excel, the 2nd line will appear in the correct field, but on a line by itself:

Row 1 Firstname Lastname Address City State Zip Phone AnotherPhone Row 2 First2 Last addy City State Zip 555-5555

So it looks like 3 records instead of 2. Does anyone have any ideas on how to pick apart the data to get it into the DB?

I managed to get Word to export it into a format where the fields are separated by '\r'. Yeah, kinda weird. I wrote the following:

open FILE, $file;
my $counter = 0;

while (<FILE>) {
  while (/\r?([^\r]*)/sgc) {
    $counter++;
    if($counter == 10) {
      print "\n\n";
      $counter = 1;
    }
    my $temp = $1;
    $temp =~ s/\n/~~~/sg;
    $temp =~ s/\"//g;
    $temp =~ s/\'/\\'/g;
    print " $temp ";
  }
}

This should print the contents of each field as it reads it, which it does seem to be doing. The only problem is that it seems to be printing "\n\n" after only 7 fields. Also, will that regex get everything I want it to? I need to capture everything between each set of '\r' including the first field which only has a trailing '\r'.

--
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: extracting data from table embedded in Word document

Reply via email to