Refactor image and jpeg parsers for access to MetadataExtractor API
-------------------------------------------------------------------

                 Key: TIKA-482
                 URL: https://issues.apache.org/jira/browse/TIKA-482
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.7
            Reporter: Staffan Olsson


When I added support for more image metadata in TIKA-472, i realized
the current design had some restrictions:
 * I could not access the typed getters from Metadata Extractor, such
as getDate (to format iso date) and getStringArray (for keywords).
 * The handler function was called one field at a time which prevents
logic where one field depends on the value of another (there is for
example record versions and fields that specify encoding)

See attached patch. It refactors TiffExtractor to MetadataExtractorExtractor.
The patch also includes the date fix, see 
https://issues.apache.org/jira/browse/TIKA-451#action_12898794

We can later add more Extractors using other libraries, and map to parsers 
based on format. For example we already use ImageIO in ImageParser so maybe 
there should be an ImageIOExtractor. To support more image formats we could 
investigate XMP, for example using 
http://www.pkg.dk/projects/XMP-Utilities-for-Java-XMPUtil4J/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to