[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216466#comment-14216466 ]
Nick Burch commented on TIKA-1445: ---------------------------------- Anyone using tika-parser OOTB has two parsers services files - built-in and vorbis. Anyone adding a third party parser under a non-ASLv2 license off the wiki will get a third. Anyone adding their own custom parsers following the instructions on the website will get a few more. My hunch is that most users won't care at all about what order the parsers are asked "hey, can you handle this file type" in. My second hunch is that users who do care will typically only care about it for a handful of formats, eg "for jpeg try ocr then image, everything else default is fine". We also need to support those users who currently say "I don't care what you find on the classpath, I only ever want you to use these 5 parsers and in this explicit order I'm passing you now" I can describe the problem, but I'm not sure on the right solution at this point... > Figure out how to add Image metadata extraction to Tesseract parser > ------------------------------------------------------------------- > > Key: TIKA-1445 > URL: https://issues.apache.org/jira/browse/TIKA-1445 > Project: Tika > Issue Type: Bug > Components: parser > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Fix For: 1.8 > > Attachments: TIKA-1445.Mattmann.101214.patch.txt, > TIKA-1445.Palsulich.102614.patch, TIKA-1445_tallison_20141027.patch.txt, > TIKA-1445_tallison_v2_20141027.patch, TIKA-1445_tallison_v3_20141027.patch > > > Now that Tesseract is the default image parser in Tika for many image types, > consider how to add back in the metadata extraction capabilities by the other > Image parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)