This code is using R like a command shell... there really is not much chance that R is the problem, and this is not a "tesseract" support forum, so this seems quite off-topic. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
On August 12, 2015 10:05:19 PM PDT, Anshuk Pal Chaudhuri <anshu...@motivitylabs.com> wrote: >Hi All, > >I have been trying to do OCR within R (reading PDF data which data as >scanned image). Have been reading about this @ >http://electricarchaeology.ca/2014/07/15/doing-ocr-within-r/ > >This a very good post. > >Effectively 3 steps: > >convert pdf to ppm (an image format) >convert ppm to tif ready for tesseract (using ImageMagick for convert) >convert tif to text file >The effective code for the above 3 steps as per the link post: > >lapply(myfiles, function(i){ > # convert pdf to ppm (an image format), just pages 1-10 of the PDF > # but you can change that easily, just remove or edit the > # -f 1 -l 10 bit in the line below >shell(shQuote(paste0("F:/xpdf/bin64/pdftoppm.exe ", i, " -f 1 -l 10 -r >600 ocrbook"))) > # convert ppm to tif ready for tesseract >shell(shQuote(paste0("F:/ImageMagick-6.9.1-Q16/convert.exe *.ppm ", i, >".tif"))) > # convert tif to text file >shell(shQuote(paste0("F:/Tesseract-OCR/tesseract.exe ", i, ".tif ", i, >" -l eng"))) > # delete tif file > file.remove(paste0(i, ".tif" )) > }) >The first two steps are happening fine. (although taking good amount of >time, for 4 pages of a pdf, but will look into the scalability part >later, first trying if this works or not) > >While running this, the first two steps work fine. > >While runinng the 3rd step, i.e > >**shell(shQuote(paste0("F:/Tesseract-OCR/tesseract.exe ", i, ".tif ", >i, " -l eng")))** >I having this error: > >Error: evaluation nested too deeply: infinite recursion / >options(expressions=)? > >Or > >Tesseract is crashing. > >Any workaround or root cause analysis would be appreciated. > >Regards, >Anshuk Pal Chaudhuri > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.