Tim Allison created TIKA-4449: --------------------------------- Summary: Improve xmp metadata key precision for PDFs Key: TIKA-4449 URL: https://issues.apache.org/jira/browse/TIKA-4449 Project: Tika Issue Type: Task Reporter: Tim Allison
PDFs (and other file formats) may have conflicting information within them about, for example, the "title" field or the "author" field. Tika's parsers typically pick one source over another and normalize the keys to dublin core or other standards. [~peterhoogendijk] and other users (likely?) want to be able to identify whether a given piece of information comes from the XMP or the docinfo. This is follow on work from TIKA-4444. The proposal is to add new metadata keys to specify when dublin core information comes directly from xmp. -- This message was sent by Atlassian Jira (v8.20.10#820010)