Jeff Pang wrote:
--- Mike Lesser <[EMAIL PROTECTED]> wrote:
Hi all. Like it says, I need to extract the content
of a PDF file.
I installed the tool pdftotext, and it works fine
for my needs. I
recall there was a very simple module that used this
to extract text,
but for the life of me, I can't find it on CPAN! Any
leads? Using a
command-line script in my own code makes me feel
icky, but I guess
I'll deal...
I found this source,is it suitable for you?
http://search.cpan.org/search?query=extract+pdf&mode=all
Well, PDF::API2 is capable of reading and creating PDFs. The problem is
that the contents of a PDF is a description on how to write a document,
not just text. The contents are like a programming language with the
text as strings inside it. I know of no module that parses this
language so you can extract the text from it.
WARNING: PDF::API2 is huge.
CPAN: http://search.cpan.org/~areibens/PDF-API2-0.61/lib/PDF/API2.pm
SourceForge: http://sourceforge.net/projects/pdfapi2
mailing list: http://tech.groups.yahoo.com/group/perl-text-pdf-modules/
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/