This is neither the Xpdf support forum nor the Windows Setup Program Reinvention support group... and you really need to read and follow the Posting Guide for the R mailing lists.
FWIW I would guess that you need to learn about environment variables and in particular about the PATH variable. There are subtleties about when and how they get defined that are OS-specific and certainly off topic here that may trip you up along the way. Alternatively, you may read the Xpdf documentation or a how-to blog about Xpdf that gives you a recipe, but again that is not about R. Once you can start a CMD shell and run the command directly then you are most of the way to getting R to invoke it. -- Sent from my phone. Please excuse my brevity. On July 21, 2016 5:26:26 PM PDT, Steven Kang <stochastick...@gmail.com> wrote: >Hi R users, > >I’m having some issues trying to extract texts from PDF file using tm >package. > >Here are the steps that were carried out: > >1. Downloaded and installed the following programs: > >- Xpdf (Copied the ‘bin32’, ‘bin64’, ‘doc’ folders into ‘C:\Program >Files\Xpdf’ directory; also added C:\Program >Files\Xpdf\bin64\pdfinfo.exe & >C:\Program Files\Xpdf\bin64\pdftotext.exe in existing PATH > >- Tesseract > >- Imagemagick > >2. Used the following scripts and the corresponding error messages: > ># Directory where PDF files are stored > >>cname <- getwd() > >>Corpus(DirSource(cname), readerControl=list(reader = readPDF)) > >Error in system2("pdftotext", c(control$text, shQuote(x), "-"), stdout >= >TRUE) : >'"pdftotext"' not found > > In addition: Warning message: > >running command '"pdfinfo" "C:\Users\R_Files\XXX.pdf"' had status 127 > >>file.exists(Sys.which(c("pdfinfo","pdftpotext"))) >[1] FALSE FALSE > >It seems like R can’t find pdfinfo & pdftotext exe files, but not sure >as >to why this would be the case despite xpdf files being copied into >‘C:\Program Files’ (Im using Windows 7 64bits) > >I’m aware that ‘pdf_text’ function from pdftools package can extract >texts >from PDF file and outputs into a string. But I was after something >which is >able to convert PDF (ie transaction data) into a dataframe without >regular >expression. Is tm package capable of doing this conversion? Are there >any >other alternatives to these methods? > >Your expertise in resolving this problem would be highly appreciated. > > >Steve > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.