[PDFdev] Use of Doc Info & XMP - Rather long post

Lori DeFurio Tue, 14 Oct 2003 00:30:29 -0700

Title: Message

Hello,

As you probably are aware, we're working on the specifications for Acrobat 7. It is important to Adobe to solicit feedback from the developer community as we work on requirements and specifications.

I have a proposal for your review on use and storage of metadata. Please reply to me directly if you are affected by this.

(FYI - if I don't get any responses, I will assume this is a "non-issue")

In the current product and specification, there are two different places to store a list of keywords associated with a document.

The document Info dictionary's Keywords entry stores a list of keywords as one big string; PDF and Acrobat don't dictate any particular way of separating this string into individual keywords.

For example:

<</ModDate(D:20031007232810-04'00')
/CreationDate(D:20031007232810-04'00')
/Title(Information Management Research)
/Creator(PScript5.dll Version 5.2)
/Producer(Acrobat Distiller 6.0 \(Windows\))
/Author(Lori DeFurio)>>

This single string is also stored in the XMP metadata as the Keywords entry in the pdf: schema. XMP also defines a list of keywords that are stored in the subject entry in the dc: (Dublin Core) schema as an actual structured list of separate keywords:

<dc:subject>
<rdf:Bag>
<rdf:li>Keyword1</rdf:li>
<rdf:li>keyword2</rdf:li>
<rdf:li>keyword3</rdf:li>
</rdf:Bag>
</dc:subject>

Currently, Acrobat makes no attempt to relate these two sets of keywords in any way. This means that when you type keywords into the Description panel of Acrobat's Document Properties dialog, they don't show up in the Description panel of the Document Metadata dialog (and vice versa). We have solid evidence that this situation is confusing to our desktop users.

We are considering a change to Acrobat (and the Adobe PDF Library) so that the two lists of keywords will be automatically kept in sync (much as Author and Title have been since Acrobat 5).

We propose to do this as follows:

1. When the Info dictionary Keywords entry has to be transcribed into the dc:subject entry in XMP, a heuristic will be applied to separate the string into individual keywords based on separator characters such as commas, semicolons, and spaces.

2. When the dc:subject entry in XMP is to be transcribed into the Info dictionary Keywords entry, the individual keyword entries will be concatenated together with semicolons as a separator character.

As with the current situation in Acrobat and Adobe PDF Library, this synchronization is carried out the first time that any metadata is inspected using the PDDocInfo API or the XAP API, and the synchronized values are saved in the document when the document is saved.

So, here's my question - would an automatic process along these lines cause problems for your development or solutions? We look forward to your feedback.

thanks,

lori & the Acrobat Engineering team

Lori DeFurio

Developer Evangelist, Adobe PDF

Adobe Systems Incorporated

[PDFdev] Use of Doc Info & XMP - Rather long post

Reply via email to