RE: [PDFdev] PDF Info

Mark Storer Mon, 19 Apr 2004 12:32:04 -0700

PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________



Preface:  I'm not trying to be antagonistic, I have genuine questions.
Sounds like you're more familiar with this territory than I.  I've been
exposed to XML, but in much the same way some have been exposed to sex
offenders in a park: Cringe and look away.  XSLT was not my cup of tea,
though I can see uses for it.

>          A number of things...
> 
> 1) Not flexible/extensible enough.

What data can you represent in XML that you can't do in Cos?  Key/value
pairs, strings, numbers, binary data.  All exist in both formats, and both
seem equally extensible, at least to my eye.

Once in the XML world, you can rope in other related technologies to give
added benefit, but that's piling on more technologies to PDF that wouldn't
have been necessary had they stuck to Cos.

> 2) Outside the /Catalog structure of the PDF - thus making it 
> indirectly related to the document

I don't see this as a problem.  And if it was, they could have linked the
DocInfo dict from the root as well as the trailer.

> 3) Not consistent with adding per-object metadata

I strongly disagree.

It seems to me that adding per-object data would be better suited to a Cos
construct, where you could make an indirect reference to the object in
question.  With XML, you'd have to go hunting in some way.  A structure
lookup, or a form field's name would have to be hunted down.  Anything else
is nigh-off limits.  Resources aren't indexed at the document level.  You
could explicitly mention the objects number [and generation], but now you're
shoehorning Cos-isms into the XML (xmp was it?).  

I'd rather have the Cos.  [Not that I'll get it]

> 4) Non-standards based for integration with databases/search 
> engines, etc.

Nearly granted, but not quite.  PDF is a standard, though one in corporate
control.  Quite a few engines out there do search PDF, with several
open-source tools floating about to help.  Now those tools have to expand to
cover a whole new format.

> (etc)

I still don't see the benefit in inflicting this additional pain on PDF
developers.  Now we HAVE to get into XML in a variety of cases (such as this
one).  Admittedly, it can be argued that dragging programmers into the 21st
century is a good thing.  On the other hand, this is an example of
"coupling", tying things together.  Strong coupling in the programming world
is a bad thing, particularly when it can be avoided (by never having
introduced XML metadata in the first place, or by stamping it out now that
its here).

>          My hope is that DocInfo is going away...

I'd like to see a single source of information too.  Which source?  We
disagree.

> 
>          Correct.  If you are reading, you should first read 
> the /Metadata 
> and then if not present there, fall back to the /Info fields.
> 
>          Do keep in mind that there are multiple namespaces 
> within the XMP 
> that also needs to be kept in sync - DublinCore, pdf, etc.

I fail to see how this is superior to adding data to the CosInfo dictionary.
It seems like far more effort that its worth.

> 
> 
> >And could they please throw a couple more namespaces into 
> that XML?  I just
> >don't think *S*E*V*E*N* is enough!  I weep for the poor 
> fools who actually
> >have to parse it.
> 
>          It's actually pretty easy, if you understand XML concepts...

I've worked indirectly with various XML parsers in the past.  I was more
concerned with human-reading, though that thought didn't come out at all in
my list post.  Reading that XML wasn't easy.

> 
> 
> >Or you could just blow the metadata away.
> 
>          You could - that is a valid approach to the problem...

...providing there's no additional data in there that Acrobat doesn't know
how to generate automatically. (just to shoot a hole in my solution)

> Leonard


My conclusion (which could be wrong, Lord knows I've been wrong before) is
that this XML was tacked on as a marketing bullet point or some techy pet
project, rather than for sound technical reasons.

None the less, it's here, and we have to deal with it.  Is that XML metadata
documented anywhere?  I haven't seen it.  It's PRESENCE is mentioned in the
PDF specification (10.2.2), but not its contents.  There's a nice list of
things which won't be there, but not much on what will.


Mark

To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

RE: [PDFdev] PDF Info

Reply via email to