[ https://issues.apache.org/jira/browse/TIKA-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich resolved TIKA-630. ---------------------------------- Resolution: Fixed > Dealing with PDF documents from scanning programs > ------------------------------------------------- > > Key: TIKA-630 > URL: https://issues.apache.org/jira/browse/TIKA-630 > Project: Tika > Issue Type: Improvement > Components: general > Affects Versions: 0.10 > Reporter: Joseph Vychtrle > Priority: Minor > Labels: ocr, pdf, > > Hey, > sorry I didn't post this to mailing list, I kinda didn't get the confirmation. > The issue is that often people don't even realize there is a difference in > pdf documents (extracted from openoffice/ms office or pdf from a scanner > software). And if Tika processes such a document, it detects pdf content > type, but there are only images in there. I don't know how to deal with that. > There should be a function that decides on the type of PDF document so that I > can take it and use some OCR software for the PDF from scanner software. > If there is a way to do that, could please anybody explain how to do that ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)