[
https://issues.apache.org/jira/browse/TIKA-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635528#comment-14635528
]
Nick Burch commented on TIKA-1691:
----------------------------------
Could you describe the use case for this?
(My initial thought is along the lines that if this is for a well-known format,
we should probably be setting this as standard, while if it's for translating
from Tika core metadata into an end-user's specific metadata structure, then
that might be better done on the getter side rather than setter. Use cases
would help clarify if those thoughts are right or not though!)
> Apache Tika for enabling metadata interoperability
> --------------------------------------------------
>
> Key: TIKA-1691
> URL: https://issues.apache.org/jira/browse/TIKA-1691
> Project: Tika
> Issue Type: New Feature
> Reporter: Giuseppe Totaro
> Assignee: Giuseppe Totaro
> Labels: mapping, metadata
>
> If am not wrong, enabling consistent metadata across file formats is already
> (partially) provided into Tika by relying on {{TikaCoreProperties}} and,
> within the context of Solr, {{ExtractingRequestHandler}} (by defining how to
> map metadata fields in {{solrconfig.xml}}). However, I am working on a new
> component for both schema mapping (to operate on the name of metadata
> properties) and instance transformation (to operate on the value of metadata)
> that consists, essentially, of the following changes:
> * A wrapper of {{Metadata}} object ({{MappedMetadata.java}}) that decorates
> the {{set}} method (currently, line number 367 of {{Metadata.java}}) by
> applying the given mapping functions (via configuration) before setting
> metadata properties.
> * Basic mapping functions ({{BasicMappingUtils.java}}) that are utility
> methods to map a set of metadata to the target schema.
> * A new {{MetadataConfig}} object that, as well as {{TikaConfig}}, may be
> configured via XML file (organized as showed in the following snippet) and
> allows to perform a fine-grained metadata mapping by using Java reflection.
> {code:xml|title=tika-metadata.xml|borderStyle=solid}
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <properties>
> <mappings>
> <mapping type="type/sub-type">
> <relation name="SOURCE_FIELD">
> <target>TARGET_FIELD</target>
> <expression>exclude|include|equivalent|overlap</expression>
> <function name="FUNCTION_NAME">
> <argument>ARGUMENT_VALUE</argument>
> </function>
> <cardinality>
> <source>SOURCE_CARDINALITY</source>
> <target>TARGET_CARDINALITY</target>
> <order>ORDER_NUMBER</order>
> <dependencies>
> <field>FIELD_NAME</field>
> </dependencies>
> </cardinality>
> </relation>
> </mapping>
> ...
> <mapping> <!-- This contains the fallback strategy for unknown metadata
> -->
> <relation>
> ...
> </relation>
> <mapping>
> </mappings>
> </properties>
> {code}
> The theoretical definition of metadata mapping is available in "[A survey of
> techniques for achieving metadata
> interoperability|http://www.researchgate.net/profile/Bernhard_Haslhofer/publication/220566013_A_survey_of_techniques_for_achieving_metadata_interoperability/links/02e7e533e76187c0b8000000.pdf]".
> This paper shows also some basic examples of metadata mappings.
> Currently, I am still working on some core functionalities, but I have
> already performed some experiments by using a small prototype.
> By the way, I think that we should modify the method {{add}} in order to use
> {{set}} instead of {{metadata.put}} (currently, line number 316 of
> {{Metadata.java}}). This is a trivial change (I could create a new Jira issue
> about that), but it would allow to be coherent with the other implementation
> of {{add}} method and, moreover, the methods of {{Metadata}} could be
> extended more easily.
> I would really appreciate your feedback about this proposal. If you believe
> that it is a good idea, I could provide the code in few days.
> Thanks a lot,
> Giuseppe
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)