[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-1445: ------------------------------ Attachment: TIKA-1445_20150106_tallison.patch There were two problems: 1) This aborted before parsing the metadata if there is no Tesseract installed {noformat} if (!ExternalParser.check(checkCmd)) return; {noformat} 2) The call to getSupportedTypes in the _TMP_X_PARSERs always returned false because of a conflict of class types. If this modification looks ok, I'll add a few more test cases and commit it. Side note: In working on this I realized that both the ImageParser and the JpegParser support jpegs. On some files, one parser returns more info than the other and vice versa...another case of competing parsers! :) > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_20150106_tallison.patch, > TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, > TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)