[ 
https://issues.apache.org/jira/browse/TIKA-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655188#comment-14655188
 ] 

Tim Allison commented on TIKA-1691:
-----------------------------------

[~gostep], [~gagravarr] and [~chrismattmann], if you could take look at my 
proposed patch on TIKA-1607, this might allow the flexibility that this issue 
seems to be looking for, namely  to specify metadata value objects with greater 
refinement than just a plain string.  Given the core nature of Metadata, I'm 
very hesitant to touch that without review.  After having let it sit for a week 
or so, I think I'd prefer to un-deprecate several of the things that I 
deprecated in the latest patch.

I'm in complete agreement with Nick that the functionality you describe should 
a) already be done by Tika (general normalization) or b) be handled by client 
applications (use-case specific mappings).   I agree with Chris that Dublin 
Core doesn't cover all the metadata items that we could get from image 
metadata, for example, but we should find an alternate standard and apply that 
(if possible).  If we need some xmp scraped out of PDFs/image files to 
experiment with, we probably have some useful resources on our Rackspace 
server. 

> Apache Tika for enabling metadata interoperability
> --------------------------------------------------
>
>                 Key: TIKA-1691
>                 URL: https://issues.apache.org/jira/browse/TIKA-1691
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Giuseppe Totaro
>            Assignee: Giuseppe Totaro
>              Labels: mapping, metadata
>         Attachments: mapping_example.pdf
>
>
> If am not wrong, enabling consistent metadata across file formats is already 
> (partially) provided into Tika by relying on {{TikaCoreProperties}} and, 
> within the context of Solr, {{ExtractingRequestHandler}} (by defining how to 
> map metadata fields in {{solrconfig.xml}}). However, I am working on a new 
> component for both schema mapping (to operate on the name of metadata 
> properties) and instance transformation (to operate on the value of metadata) 
> that consists, essentially, of the following changes:
> * A wrapper of {{Metadata}} object ({{MappedMetadata.java}}) that decorates 
> the {{set}} method (currently, line number 367 of {{Metadata.java}}) by 
> applying the given mapping functions (via configuration) before setting 
> metadata properties.
> * Basic mapping functions ({{BasicMappingUtils.java}}) that are utility 
> methods to map a set of metadata to the target schema.
> * A new {{MetadataConfig}} object that, as well as {{TikaConfig}}, may be 
> configured via XML file (organized as showed in the following snippet) and 
> allows to perform a fine-grained metadata mapping by using Java reflection.
> {code:xml|title=tika-metadata.xml|borderStyle=solid}
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <properties>
>   <mappings>
>     <mapping type="type/sub-type">
>       <relation name="SOURCE_FIELD">
>         <target>TARGET_FIELD</target>
>         <expression>exclude|include|equivalent|overlap</expression>
>         <function name="FUNCTION_NAME">
>           <argument>ARGUMENT_VALUE</argument>
>         </function>
>         <cardinality>
>           <source>SOURCE_CARDINALITY</source>
>           <target>TARGET_CARDINALITY</target>
>           <order>ORDER_NUMBER</order>
>           <dependencies>
>             <field>FIELD_NAME</field>
>           </dependencies>
>         </cardinality>
>       </relation>
>     </mapping>
>     ...
>     <mapping> <!-- This contains the fallback strategy for unknown metadata 
> -->
>       <relation>
>         ...
>       </relation>
>     <mapping>
>   </mappings>
> </properties>
> {code}
> The theoretical definition of metadata mapping is available in "[A survey of 
> techniques for achieving metadata 
> interoperability|http://www.researchgate.net/profile/Bernhard_Haslhofer/publication/220566013_A_survey_of_techniques_for_achieving_metadata_interoperability/links/02e7e533e76187c0b8000000.pdf]";.
>  This paper shows also some basic examples of metadata mappings.
> Currently, I am still working on some core functionalities, but I have 
> already performed some experiments by using a small prototype.
> By the way, I think that we should modify the method {{add}} in order to use 
> {{set}} instead of {{metadata.put}} (currently, line number 316 of 
> {{Metadata.java}}). This is a trivial change (I could create a new Jira issue 
> about that), but it would allow to be coherent with the other implementation 
> of {{add}} method and, moreover, the methods of {{Metadata}} could be 
> extended more easily.
> I would really appreciate your feedback about this proposal. If you believe 
> that it is a good idea, I could provide the code in few days.
> Thanks a lot,
> Giuseppe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to