David Kane a écrit :

I have a pdf file that I would like to parse into R:

http://www.williams.edu/Registrar/geninfo/faculty.pdf

For now, I open the file in Acrobat by hand, then save it "as text"
and then use readLines(). That works fine but a) I am concerned that
some information may be lost and b) I may be doing this a lot, so I
would rather have R grab the information from the pdf file directly.

So: is there something like readPDF() for R?

Thanks,

Dave Kane

PS. If you're curious, here is the sort of work that I want to do with
this data:
http://www.ephblog.com/2010/01/08/class-update-and-faculty-ages/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Did you know this site ?

http://www.accesspdf.com/pdftk/

There could be a command line to transform the pdf file in XML format and then read the XML file with R.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to