RE: [PDFdev] PDF Info

Leonard Rosenthol Mon, 19 Apr 2004 13:07:37 -0700


PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

At 03:30 PM 4/19/2004, Mark Storer wrote:

Preface:  I'm not trying to be antagonistic, I have genuine questions.
Sounds like you're more familiar with this territory than I.

I guess ;). I've been doing XML longer than I've been doing PDF...believe it or not...

> 1) Not flexible/extensible enough.

What data can you represent in XML that you can't do in Cos?

There isn't necessary anything that XML can represent that you can't in Cos - there are just many things you can represent BETTER in XML.

For example, Cos has no separation of "attributes" and "values", unless you want limit yourself to a CosStream (CosStreamDict being the attrs and the stream data being the value). BUT in that case, you can't have Cos objs inside the stream (well PDF 1.5 changes that - sort of).

XML does a better job of representing hierarchy - things like /Kids and /Parent in Cos are one approach - but they aren't even a consistent approach inside of a PDF.

Key/value pairs, strings, numbers, binary data. All exist in both formats, and both seem equally extensible, at least to my eye.

And in some ways, Cos is better - since it is a strongly typed "language" while in XML everything is a string...unless you have a Schema to help you understand what to do with the strings. It also has the concept of indirect references vs. things such as xpath.

> 3) Not consistent with adding per-object metadata

I strongly disagree.
It seems to me that adding per-object data would be better suited to a Cos
construct, where you could make an indirect reference to the object in
question.

You have my comment backwards.

Right now, most high level Cos/PD objects can have a /Metadata key attached to them which is the EXACT SAME XMP stream that one might use at the document level.

That EXACT SAME XMP stream is also used by all the other Adobe applications (and 3rd party applications) - so that when you author a document in Photoshop with extended metadata, place it into InDesign and then produce a PDF, the XML stream is passed along the authoring chain w/o any data loss. It can also be easily updated, should you "back up" the authoring process.

If the PDF metadata were Cos-based, it would have to be transformed from the other formats used by the other applications. Not a good idea...

Nearly granted, but not quite.  PDF is a standard, though one in corporate
control.  Quite a few engines out there do search PDF, with several
open-source tools floating about to help.  Now those tools have to expand to
cover a whole new format.

Actually, it's the other way around ;).

There are MANY MORE XML-based solutions (and XML toolkits) already in existence than there are PDF options - and they can all work with PDFs and their metadata WITHOUT ANY CHANGES. (though there are issues here with incremental update tables, but let's table that piece).

This is why the /Metadata stream is uncompressed...

I still don't see the benefit in inflicting this additional pain on PDF
developers. Now we HAVE to get into XML in a variety of cases (such as this
one).

Understood, but get used to it ;).

PDF 1.4 introduced the XML/XMP metadata.

PDF 1.5 introduced XML for forms (XFA).

Who knows what future versions of PDF will use XML for?!?!!

> My hope is that DocInfo is going away...
I'd like to see a single source of information too.  Which source?  We
disagree.

True, but the writing is on the wall.

The fact that Acrobat 6 ignores DocInfo in favor of /Metadata should be a clear indication to you where Adobe is going...

>          Do keep in mind that there are multiple namespaces
> within the XMP that also needs to be kept in sync - DublinCore, pdf, etc.

I fail to see how this is superior to adding data to the CosInfo dictionary.
It seems like far more effort that its worth.

Spend some time talking to archiving/document management folks about metadata - it is QUITE enlightening...

> It's actually pretty easy, if you understand XML concepts...

I've worked indirectly with various XML parsers in the past.  I was more
concerned with human-reading, though that thought didn't come out at all in
my list post.  Reading that XML wasn't easy.

Although one of the original goals for XML was human consumption, that doesn't imply that all XML grammar must adhere to that principle...And admittedly, if you take the namespaces out of the picture, it's pretty easy to read/follow and not all that different from /Info.

Namespaces, however, do complicate things.

> You could - that is a valid approach to the problem...
...providing there's no additional data in there that Acrobat doesn't know
how to generate automatically. (just to shoot a hole in my solution)

Correct - and that would be an assumption...

My conclusion (which could be wrong, Lord knows I've been wrong before) is
that this XML was tacked on as a marketing bullet point or some techy pet
project, rather than for sound technical reasons.

Nope, it was actually well thought out as part of a global XML-based metadata strategy at Adobe.

None the less, it's here, and we have to deal with it.  Is that XML metadata
documented anywhere?  I haven't seen it.

Adobe's XMP documentation and associated XMP toolkit...

Leonard

---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                             215-938-0880 (fax)


To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

RE: [PDFdev] PDF Info

Reply via email to