On Thu, Oct 15, 2009 at 3:28 PM, Marc Schwartz <marc_schwa...@me.com> wrote:
> On Oct 15, 2009, at 3:43 AM, Biedermann, Jürgen wrote:

> You don't indicate the OS you are on, but you will want to get a hold of
> 'pdftotext', which is a command line application that can extract the
> textual content from the PDF files.

 That's assuming the text is in the PDF as a text object. If it's a
scan of a paper document the chances are that all you have is an
image, in which case you need to do OCR (optical character
recognition) or get someone to type it all in again.

 Even if you can get all the text out with pdftext, R might not be the
right tool for the job - I'd do this kind of text processing and
matching job in Python (and before Python, I'd have used Perl). But if
all you have is a wRench...

Barry

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to