RE: [PDFdev] PDF Info

Leonard Rosenthol Mon, 19 Apr 2004 11:47:46 -0700


PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

At 02:19 PM 4/19/2004, Mark Storer wrote:

I suspect you're running afoul of the "Metadata" XML which is a stream in
the catalog object (CosDocGetRoot()).  You may have to modify the metadata
too.  Or instead.  I understant that Acrobat will believe the metadata over
the info dict.

Acrobat 5 prefers Info dict over XMP/XML. Acrobat 6 prefers XMP/XML over Info dict.

And I still don't like having two sources of information like that.  What
was wrong with the original doc-info fields?

A number of things...

1) Not flexible/extensible enough. 2) Outside the /Catalog structure of the PDF - thus making it indirectly related to the document 3) Not consistent with adding per-object metadata 4) Non-standards based for integration with databases/search engines, etc. (etc.)

They could have added more
fields, structured fields even.  But no... now we have two sources for
author/title/etc and have to keep them synchronized either through the
PDDoc?etInfo calls, or manually.

My hope is that DocInfo is going away...

If I'm right, you'll have to get the stream, find the <pdf:author> (that's
an educated guess) tag and remove it, then put the modified stream back into
the Metadata...

Anyone who is writing software to write/modify PDF document info MUST now be sure to keep the two sets of information in sync!

This
tells me that we not only have to keep things synchronized, but we should
check two places for a field that may be in either one or the other, or
both.

Correct. If you are reading, you should first read the /Metadata and then if not present there, fall back to the /Info fields.

Do keep in mind that there are multiple namespaces within the XMP that also needs to be kept in sync - DublinCore, pdf, etc.

And could they please throw a couple more namespaces into that XML?  I just
don't think *S*E*V*E*N* is enough!  I weep for the poor fools who actually
have to parse it.

It's actually pretty easy, if you understand XML concepts...

Or you could just blow the metadata away.

You could - that is a valid approach to the problem...

Acrobat will oh-so-kindly add it back in the first chance it
gets, but I'd hope it would at least be generated from your data.

Yes, Acrobat will always put /Metadata back in. Yes, it is generated from /Info when present.

Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)


To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

RE: [PDFdev] PDF Info

Reply via email to