[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217100#comment-14217100 ]
Lewis John McGibbney commented on TIKA-1445: -------------------------------------------- We can run many extractors against one MediaType with Any23. In this case we produce triples output. In the case of Tika, if we were to start with a scenario where we were *just* populating the Metadata container then I think it would be an excellent start. I'm going to investigate how we currently chain the extractors together in Any23 tonight and will make best efforts report it here. [~p_ansell] can maybe help out here as well as he has been influential in refactoring Any23 extractor behavior in the past. > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)