Hello,
As you
probably are aware, we're working on the specifications for Acrobat 7. It is
important to Adobe to solicit feedback from the developer community as we work
on requirements and specifications.
I have a
proposal for your review on use and storage of metadata. Please reply to me
directly if you are affected by this.
(FYI - if I
don't get any responses, I will assume this is a
"non-issue")
In the
current product and specification, there are two different places to
store a list of keywords associated with a document.
The document Info dictionary's Keywords
entry stores a list of keywords as one big string; PDF and Acrobat don't
dictate any particular way of
separating this string into individual keywords.
For
example:
<</ModDate(D:20031007232810-04'00')
/CreationDate(D:20031007232810-04'00')
/Title(Information Management Research)
/Creator(PScript5.dll Version 5.2)
/Producer(Acrobat Distiller 6.0 \(Windows\))
/Author(Lori DeFurio)>>
This single string is also stored in the
XMP metadata as the Keywords entry in the pdf: schema. XMP also defines a list of keywords that are
stored in the subject entry in the dc: (Dublin Core) schema as an actual
structured list of separate keywords:
<dc:subject>
<rdf:Bag>
<rdf:li>Keyword1</rdf:li>
<rdf:li>keyword2</rdf:li>
<rdf:li>keyword3</rdf:li>
</rdf:Bag>
</dc:subject>
Currently, Acrobat makes no attempt to
relate these two sets of keywords in any way. This means that when you type
keywords into the Description panel of Acrobat's Document Properties dialog,
they don't show up in the Description panel of the Document Metadata dialog (and
vice versa). We have solid evidence that this situation is confusing to our
desktop users.
We are considering a change to Acrobat (and the Adobe PDF Library) so that the two lists of
keywords will be automatically kept in sync (much as Author and Title have been
since Acrobat 5).
We propose to do this as
follows:
1. When the Info dictionary Keywords entry
has to be transcribed into the dc:subject entry in XMP, a heuristic will be
applied to separate the string into individual keywords based on separator
characters such as commas, semicolons, and spaces.
2. When the dc:subject entry in XMP is to
be transcribed into the Info dictionary Keywords entry, the individual keyword
entries will be concatenated together with semicolons as a separator
character.
As with the current situation in Acrobat
and Adobe PDF Library, this
synchronization is carried out the
first time that any metadata is inspected using the PDDocInfo API or the XAP
API, and the synchronized values are saved in the document when the document is
saved.
So,
here's my question - would an automatic process along these lines
cause problems for your development or solutions? We look forward to your
feedback.
thanks,
lori &
the Acrobat Engineering team
Lori DeFurio
Developer Evangelist, Adobe
PDF
Adobe Systems
Incorporated
