On 1/4/07, Wagner, David --- Senior Programmer Analyst --- WGO
<[EMAIL PROTECTED]> wrote:
        I need to look at the text from page 1 of a couple of thousand pdf's 
and do a regex on searching for the data.
        Before sending I tried a number of other things, but either died or 
showed me data like the above.

        Any insight or simple script which will display the text would be 
greatly appreciated.

I had to do this the other day and got frustrated with the modules I
found and ended up using pdftotext which comes with xpdf, like so:

 my @pages = split /^L/, `$pdftotext -layout $inputfile -`;
 for my $page (@pages) {
   # do stuff
 }

Without the -layout switch, parsing any sort of tabular data becomes a
lot more annoying.

Cheers,
Dave

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to