[ 
https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209162#comment-15209162
 ] 

Ray Gauss II commented on TIKA-774:
-----------------------------------

bq. we should add a static check for whether exiftool is available and adjust 
"handled" mimes at that point.

I think we'll find other areas to improve on as well, I just wanted to get the 
ball rolling again on the contribution and review as we had to close the source 
on the stand-alone project mentioned above.

bq. I should have a chance to look more closely early next week, but I doubt 
there's reason to wait for my feedback.

We'd value your feed back, and it's been over 4 years, we can wait a few more 
weeks. :)

bq. Is this a replacement for the one I hacked together?

There's the possibility for the two to coexist, perhaps requiring this parser 
to be explicitly called programmatically.

At a high level the biggest differences are:
# As mentioned in TIKA-1639, there's an extensive mapping from ExifTool's 
namespace to proper Tika properties (currently done programmatically)
# It includes the ability embed, i.e. writing metadata back into binary files. 
(TIKA-776)

> ExifTool Parser
> ---------------
>
>                 Key: TIKA-774
>                 URL: https://issues.apache.org/jira/browse/TIKA-774
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Requires be installed 
> (http://www.sno.phy.queensu.ca/~phil/exiftool/)
>            Reporter: Ray Gauss II
>              Labels: features, new-parser, newbie, patch
>             Fix For: 1.13
>
>         Attachments: testJPEG_IPTC_EXT.jpg, 
> tika-core-exiftool-parser-patch.txt, tika-parsers-exiftool-parser-patch.txt
>
>
> Adds an external parser that calls ExifTool to extract extended metadata 
> fields from images and other content types.
> In the core project:
> An ExifTool interface is added which contains Property objects that define 
> the metadata fields available.
> An additional Property constructor for internalTextBag type.
> In the parsers project:
> An ExiftoolMetadataExtractor is added which does the work of calling ExifTool 
> on the command line and mapping the response to tika metadata fields.  This 
> extractor could be called instead of or in addition to the existing 
> ImageMetadataExtractor and JempboxExtractor under TiffParser and/or 
> JpegParser but those have not been changed at this time.
> An ExiftoolParser is added which calls only the ExiftoolMetadataExtractor.
> An ExiftoolTikaMapper is added which is responsible for mapping the ExifTool 
> metadata fields to existing tika and Drew Noakes metadata fields if enabled.
> An ElementRdfBagMetadataHandler is added for extracting multi-valued RDF Bag 
> implementations in XML files.
> An ExifToolParserTest is added which tests several expected XMP and IPTC 
> metadata values in testJPEG_IPTC_EXT.jpg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to