Jeff Pang wrote:
--- Mike Lesser <[EMAIL PROTECTED]> wrote:

Hi all. Like it says, I need to extract the content
of a PDF file.

I installed the tool pdftotext, and it works fine
for my needs. I recall there was a very simple module that used this to extract text, but for the life of me, I can't find it on CPAN! Any leads? Using a command-line script in my own code makes me feel icky, but I guess I'll deal...


I found this source,is it suitable for you?
http://search.cpan.org/search?query=extract+pdf&mode=all

Well, PDF::API2 is capable of reading and creating PDFs. The problem is that the contents of a PDF is a description on how to write a document, not just text. The contents are like a programming language with the text as strings inside it. I know of no module that parses this language so you can extract the text from it.

WARNING: PDF::API2 is huge.

CPAN: http://search.cpan.org/~areibens/PDF-API2-0.61/lib/PDF/API2.pm
SourceForge: http://sourceforge.net/projects/pdfapi2
mailing list: http://tech.groups.yahoo.com/group/perl-text-pdf-modules/

--
Just my 0.00000002 million dollars worth,
   Shawn

"For the things we have to learn before we can do them, we learn by doing them."
  Aristotle

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to