[
https://issues.apache.org/jira/browse/TIKA-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900196#comment-17900196
]
Peter Wyatt commented on TIKA-4357:
-----------------------------------
For PDF XMP ({*}Metadata{*} streams), can the namespace include something about
which object the Metadata was attached to?
There can be multiple XMP streams so knowing what comes from where is very
helpful.
> Ensure namespace prefixes in metadata keys in 4.x
> -------------------------------------------------
>
> Key: TIKA-4357
> URL: https://issues.apache.org/jira/browse/TIKA-4357
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Labels: 4x
>
> There are several places in the codebase where we are mindlessly trusting a
> file's metadata key without namespace prefixing. This is dangerous because
> user data could overwrite metadata from Tika or do other unpleasant things.
> There are other places where we were transitioning to namespace prefixes and
> left in the legacy keys without prefixes
> (https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParser.java#L633).
>
> In 4.x, we should look through the codebase and ensure that we are prefixing
> custom metadata keys.
> A related idea is that rather than have format specific "custom:" prefixes,
> we use a general prefix for all file formats...WDYT? For those parsers where
> we want to distinguish the raw source of the information -- I'm looking at
> you pdf docinfo and pdf xmp! -- we could use two keys.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)