Tim Allison created TIKA-4449:
---------------------------------

             Summary: Improve xmp metadata key precision for PDFs
                 Key: TIKA-4449
                 URL: https://issues.apache.org/jira/browse/TIKA-4449
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


PDFs (and other file formats) may have conflicting information within them 
about, for example, the "title" field or the "author" field.

Tika's parsers typically pick one source over another and normalize the keys to 
dublin core or other standards.

[~peterhoogendijk] and other users (likely?) want to be able to identify 
whether a given piece of information comes from the XMP or the docinfo. This is 
follow on work from TIKA-4444. The proposal is to add new metadata keys to 
specify when dublin core information comes directly from xmp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to