On 1/4/07, Wagner, David --- Senior Programmer Analyst --- WGO
<[EMAIL PROTECTED]> wrote:
I need to look at the text from page 1 of a couple of thousand pdf's
and do a regex on searching for the data.
Before sending I tried a number of other things, but either died or
showed me data like the above.
Any insight or simple script which will display the text would be
greatly appreciated.
I had to do this the other day and got frustrated with the modules I
found and ended up using pdftotext which comes with xpdf, like so:
my @pages = split /^L/, `$pdftotext -layout $inputfile -`;
for my $page (@pages) {
# do stuff
}
Without the -layout switch, parsing any sort of tabular data becomes a
lot more annoying.
Cheers,
Dave
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/