[jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Tim Allison (JIRA) Tue, 06 Jan 2015 18:45:07 -0800

     [ 
https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Allison updated TIKA-1445:
------------------------------
    Attachment: TIKA-1445_20150106_tallison.patch

There were two problems:

1) This aborted before parsing the metadata if there is no Tesseract installed

{noformat}
if (!ExternalParser.check(checkCmd))
 return;
{noformat}

2) The call to getSupportedTypes in the _TMP_X_PARSERs always returned false 
because of a conflict of class types.

If this modification looks ok, I'll add a few more test cases and commit it.

Side note:  In working on this I realized that both the ImageParser and the 
JpegParser support jpegs. On some files, one parser returns more info than the 
other and vice versa...another case of competing parsers! :)

> Figure out how to add Image metadata extraction to Tesseract parser
> -------------------------------------------------------------------
>
>                 Key: TIKA-1445
>                 URL: https://issues.apache.org/jira/browse/TIKA-1445
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.8
>
>         Attachments: 000003.doc, TIKA-1445.Mattmann.101214.patch.txt, 
> TIKA-1445.Palsulich.102614.patch, TIKA-1445_20150106_tallison.patch, 
> TIKA-1445_tallison_20141027.patch.txt, TIKA-1445_tallison_v2_20141027.patch, 
> TIKA-1445_tallison_v3_20141027.patch
>
>
> Now that Tesseract is the default image parser in Tika for many image types, 
> consider how to add back in the metadata extraction capabilities by the other 
> Image parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

Reply via email to