[ 
https://issues.apache.org/jira/browse/TIKA-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907097#action_12907097
 ] 

Staffan Olsson commented on TIKA-482:
-------------------------------------

Merged EXIF parsing so now all fields are processed the same way, in:
http://github.com/solsson/tika/commit/e61048c177560b8aa5585fd5b1d9194f446bec65
and some minor additions in:
http://github.com/solsson/tika/commit/c22107201178064ffd2260ca9136cb0d57c46d1f
http://github.com/solsson/tika/commit/880c0eb11f8410296d9b1401afef2dc37abbaf24

Set dates as Nick suggested:
http://github.com/solsson/tika/commit/866d396497dd7b95329f465d8cb220ad2899dc8b

Handling keywords as multi-value since:
http://github.com/solsson/tika/commit/9742c826a5edad6d0288b83d3653735dd85b116f
Note:
 * Assertions for "subject" field and unicode characters in description may 
need to be commented out until XMP support is merged.
 * This commit disables the copying of all fields for reasons stated in the 
commit comment.
Can it be done like in PDFParser, with only the fields that are not explicitly 
mapped?



> Refactor image and jpeg parsers for access to MetadataExtractor API
> -------------------------------------------------------------------
>
>                 Key: TIKA-482
>                 URL: https://issues.apache.org/jira/browse/TIKA-482
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Staffan Olsson
>         Attachments: TIKA-451-DublinCore_and_TIKA-482.patch
>
>
> When I added support for more image metadata in TIKA-472, i realized
> the current design had some restrictions:
>  * I could not access the typed getters from Metadata Extractor, such
> as getDate (to format iso date) and getStringArray (for keywords).
>  * The handler function was called one field at a time which prevents
> logic where one field depends on the value of another (there is for
> example record versions and fields that specify encoding)
> See attached patch. It refactors TiffExtractor to MetadataExtractorExtractor.
> The patch also includes the date fix, see 
> https://issues.apache.org/jira/browse/TIKA-451#action_12898794
> We can later add more Extractors using other libraries, and map to parsers 
> based on format. For example we already use ImageIO in ImageParser so maybe 
> there should be an ImageIOExtractor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to