[ 
https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148016#comment-13148016
 ] 

Ray Gauss II commented on TIKA-775:
-----------------------------------

I think there are many use cases for embedding metadata in addition to 
extracting, but for us specifically: we're using extensions to Alfresco to 
enable users to modify or enter new metadata via its web interface which then 
triggers an Alfresco metadata embedder which will use these tika additions to 
do the work of actually writing the metadata to the file.

We currently focus on images and embedding IPTC and XMP metadata but I'd 
envision people would have similar needs embedding things like ID3 in audio, 
MPEG-7 for video, etc. in all sorts of clients and apps.

I'm sure there are other existing tools but Tika is, quite frankly, pretty 
sweet, and these 'write' capabilities seem like a perfect fit for tika's 
existing 'read' features, and the metadata tags and concepts are already 
present and well organized.

I agree that great care should be taken in implementing this and I've tried to 
structure things so that they follow precedence set on the parsing side but I'm 
pretty new to the project.

I'll have a look at refactoring for OutputStream as an argument.

Thanks for taking a look!
                
> Embed Capabilities
> ------------------
>
>                 Key: TIKA-775
>                 URL: https://issues.apache.org/jira/browse/TIKA-775
>             Project: Tika
>          Issue Type: Improvement
>          Components: general, metadata
>    Affects Versions: 1.0
>         Environment: The default ExternalEmbedder requires that sed be 
> installed.
>            Reporter: Ray Gauss II
>              Labels: embed, patch
>             Fix For: 1.1
>
>         Attachments: tika-core-embed-patch.txt, tika-parsers-embed-patch.txt
>
>
> This patch defines and implements the concept of embedding tika metadata into 
> a file stream, the reverse of extraction.
> In the tika-core project an interface defining an Embedder and a generic sed 
> ExternalEmbedder implementation meant to be extended or configured are added. 
>  These classes are essentially a reverse flow of the existing Parser and 
> ExternalParser classes.
> In the tika-parsers project an ExternalEmbedderTest unit test is added which 
> uses the default ExternalEmbedder (calls sed) to embed a value placed in 
> Metadata.DESCRIPTION then verify the operation by parsing the resulting 
> stream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to